This is an automated email from the ASF dual-hosted git repository. agrove pushed a commit to branch main in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git
The following commit(s) were added to refs/heads/main by this push: new ae7a1abb6 Add changelog for 0.9.0 (#1955) ae7a1abb6 is described below commit ae7a1abb663001619c13d0fd3d19accce44e6fd2 Author: Andy Grove <agr...@apache.org> AuthorDate: Tue Jul 1 06:39:10 2025 -0600 Add changelog for 0.9.0 (#1955) --- dev/changelog/0.9.0.md | 210 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 210 insertions(+) diff --git a/dev/changelog/0.9.0.md b/dev/changelog/0.9.0.md new file mode 100644 index 000000000..12fca847d --- /dev/null +++ b/dev/changelog/0.9.0.md @@ -0,0 +1,210 @@ +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# DataFusion Comet 0.9.0 Changelog + +This release consists of 139 commits from 24 contributors. See credits at the end of this changelog for more information. + +**Fixed bugs:** + +- fix: typo for `instr` in fuzz testing [#1686](https://github.com/apache/datafusion-comet/pull/1686) (mbutrovich) +- fix: Bucketed scan fallback for native_datafusion Parquet scan [#1720](https://github.com/apache/datafusion-comet/pull/1720) (mbutrovich) +- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan [#1724](https://github.com/apache/datafusion-comet/pull/1724) (mbutrovich) +- fix: Check acquired memory when CometMemoryPool grows [#1732](https://github.com/apache/datafusion-comet/pull/1732) (wForget) +- fix: Fix data race in memory profiling [#1727](https://github.com/apache/datafusion-comet/pull/1727) (andygrove) +- fix: Enable some DPP Spark SQL tests [#1734](https://github.com/apache/datafusion-comet/pull/1734) (andygrove) +- fix: support literal null list and map [#1742](https://github.com/apache/datafusion-comet/pull/1742) (kazuyukitanimura) +- fix: get_struct field is incorrect when struct in array [#1687](https://github.com/apache/datafusion-comet/pull/1687) (comphead) +- fix: cast map types correctly in schema adapter [#1771](https://github.com/apache/datafusion-comet/pull/1771) (parthchandra) +- fix: correct schema type checking in native_iceberg_compat [#1755](https://github.com/apache/datafusion-comet/pull/1755) (parthchandra) +- fix: default values for native_datafusion scan [#1756](https://github.com/apache/datafusion-comet/pull/1756) (mbutrovich) +- fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet [#1782](https://github.com/apache/datafusion-comet/pull/1782) (andygrove) +- fix: cargo install tpchgen-cli in benchmark doc [#1797](https://github.com/apache/datafusion-comet/pull/1797) (zhuqi-lucas) +- fix: support `map_keys` [#1788](https://github.com/apache/datafusion-comet/pull/1788) (comphead) +- fix: fall back on nested types for default values [#1799](https://github.com/apache/datafusion-comet/pull/1799) (mbutrovich) +- fix: Re-enable Spark 4 tests on Linux [#1806](https://github.com/apache/datafusion-comet/pull/1806) (andygrove) +- fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [#1785](https://github.com/apache/datafusion-comet/pull/1785) (parthchandra) +- fix: native_iceberg_compat: move checking parquet types above fetching batch [#1809](https://github.com/apache/datafusion-comet/pull/1809) (mbutrovich) +- fix: translate missing or corrupt file exceptions, fall back if asked to ignore [#1765](https://github.com/apache/datafusion-comet/pull/1765) (mbutrovich) +- fix: Fix Spark SQL AQE exchange reuse test failures [#1811](https://github.com/apache/datafusion-comet/pull/1811) (coderfender) +- fix: Enable more Spark SQL tests [#1834](https://github.com/apache/datafusion-comet/pull/1834) (andygrove) +- fix: support `map_values` [#1835](https://github.com/apache/datafusion-comet/pull/1835) (comphead) +- fix: Handle case where num_cols == 0 in native execution [#1840](https://github.com/apache/datafusion-comet/pull/1840) (andygrove) +- fix: Fix shuffle writing rows containing null struct fields [#1845](https://github.com/apache/datafusion-comet/pull/1845) (Kontinuation) +- fix: Fall back to Spark for `RANGE BETWEEN` window expressions [#1848](https://github.com/apache/datafusion-comet/pull/1848) (andygrove) +- fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack [#1865](https://github.com/apache/datafusion-comet/pull/1865) (andygrove) +- fix: support read Struct by user schema [#1860](https://github.com/apache/datafusion-comet/pull/1860) (comphead) +- fix: map parquet field_id correctly (native_iceberg_compat) [#1815](https://github.com/apache/datafusion-comet/pull/1815) (parthchandra) +- fix: cast_struct_to_struct aligns to Spark behavior [#1879](https://github.com/apache/datafusion-comet/pull/1879) (mbutrovich) +- fix: correctly handle schemas with nested array of struct (native_iceberg_compat) [#1883](https://github.com/apache/datafusion-comet/pull/1883) (parthchandra) +- fix: set RangePartitioning for native shuffle default to false [#1907](https://github.com/apache/datafusion-comet/pull/1907) (mbutrovich) +- fix: conflict between #1905 and #1892. [#1919](https://github.com/apache/datafusion-comet/pull/1919) (mbutrovich) +- fix: Add overflow check to evaluate of sum decimal accumulator [#1922](https://github.com/apache/datafusion-comet/pull/1922) (leung-ming) +- fix: Fix overflow handling when casting float to decimal [#1914](https://github.com/apache/datafusion-comet/pull/1914) (leung-ming) +- fix: Ignore a test case fails on Miri [#1951](https://github.com/apache/datafusion-comet/pull/1951) (leung-ming) + +**Performance related:** + +- perf: Add memory profiling [#1702](https://github.com/apache/datafusion-comet/pull/1702) (andygrove) +- perf: Add performance tracing capability [#1706](https://github.com/apache/datafusion-comet/pull/1706) (andygrove) +- perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config [#1936](https://github.com/apache/datafusion-comet/pull/1936) (andygrove) + +**Implemented enhancements:** + +- feat: add jemalloc as optional custom allocator [#1679](https://github.com/apache/datafusion-comet/pull/1679) (mbutrovich) +- feat: support `array_repeat` [#1680](https://github.com/apache/datafusion-comet/pull/1680) (comphead) +- feat: More warning info for users [#1667](https://github.com/apache/datafusion-comet/pull/1667) (hsiang-c) +- feat: decode() expression when using 'utf-8' encoding [#1697](https://github.com/apache/datafusion-comet/pull/1697) (mbutrovich) +- feat: regexp_replace() expression with no starting offset [#1700](https://github.com/apache/datafusion-comet/pull/1700) (mbutrovich) +- feat: Improve performance tracing feature [#1730](https://github.com/apache/datafusion-comet/pull/1730) (andygrove) +- feat: Set/cancel with job tag and make max broadcast table size configurable [#1693](https://github.com/apache/datafusion-comet/pull/1693) (wForget) +- feat: Add support for `expm1` expression from `datafusion-spark` crate [#1711](https://github.com/apache/datafusion-comet/pull/1711) (andygrove) +- feat: Add config option for showing all Comet plan transformations [#1780](https://github.com/apache/datafusion-comet/pull/1780) (andygrove) +- feat: Support Type widening: byte → short/int/long, short → int/long [#1770](https://github.com/apache/datafusion-comet/pull/1770) (huaxingao) +- feat: Translate Hadoop S3A configurations to object_store configurations [#1817](https://github.com/apache/datafusion-comet/pull/1817) (Kontinuation) +- feat: Upgrade to official DataFusion 48.0.0 release [#1877](https://github.com/apache/datafusion-comet/pull/1877) (andygrove) +- feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [#1747](https://github.com/apache/datafusion-comet/pull/1747) (andygrove) +- feat: support RangePartitioning with native shuffle [#1862](https://github.com/apache/datafusion-comet/pull/1862) (mbutrovich) +- feat: Add support for signum expression [#1889](https://github.com/apache/datafusion-comet/pull/1889) (andygrove) +- feat: Add support to lookup map by key [#1898](https://github.com/apache/datafusion-comet/pull/1898) (comphead) +- feat: support array_max [#1892](https://github.com/apache/datafusion-comet/pull/1892) (drexler-sky) +- feat: pass ignore_nulls flag to first and last [#1866](https://github.com/apache/datafusion-comet/pull/1866) (rluvaton) +- feat: Implement ToPrettyString [#1921](https://github.com/apache/datafusion-comet/pull/1921) (andygrove) +- feat: Support hadoop s3a config in native_iceberg_compat [#1925](https://github.com/apache/datafusion-comet/pull/1925) (parthchandra) +- feat: rand expression support [#1199](https://github.com/apache/datafusion-comet/pull/1199) (akupchinskiy) +- feat: supports array_distinct [#1923](https://github.com/apache/datafusion-comet/pull/1923) (drexler-sky) +- feat: `auto` scan mode should check for supported file location [#1930](https://github.com/apache/datafusion-comet/pull/1930) (andygrove) +- feat: Encapsulate Parquet objects [#1920](https://github.com/apache/datafusion-comet/pull/1920) (huaxingao) +- feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` [#1933](https://github.com/apache/datafusion-comet/pull/1933) (andygrove) +- feat: Supports array_union [#1945](https://github.com/apache/datafusion-comet/pull/1945) (drexler-sky) + +**Documentation updates:** + +- docs: Add changelog for 0.8.0 [#1675](https://github.com/apache/datafusion-comet/pull/1675) (andygrove) +- docs: Add instructions on running TPC-H on macOS [#1647](https://github.com/apache/datafusion-comet/pull/1647) (andygrove) +- docs: Add documentation for accelerating Iceberg Parquet scans with Comet [#1683](https://github.com/apache/datafusion-comet/pull/1683) (andygrove) +- docs: Add note on setting `core.abbrev` when generating diffs [#1735](https://github.com/apache/datafusion-comet/pull/1735) (andygrove) +- docs: Remove outdated param in macos bench guide [#1748](https://github.com/apache/datafusion-comet/pull/1748) (ding-young) +- docs: Add instructions for running individual Spark SQL tests from sbt [#1752](https://github.com/apache/datafusion-comet/pull/1752) (coderfender) +- docs: Add documentation for native_datafusion Parquet scanner's S3 support [#1832](https://github.com/apache/datafusion-comet/pull/1832) (Kontinuation) +- docs: Add docs stating that Comet does not support reading decimals encoded in Parquet BINARY format [#1895](https://github.com/apache/datafusion-comet/pull/1895) (andygrove) + +**Other:** + +- chore: Start 0.9.0 development [#1676](https://github.com/apache/datafusion-comet/pull/1676) (andygrove) +- chore: Update viable crates [#1677](https://github.com/apache/datafusion-comet/pull/1677) (EmilyMatt) +- chore: match Maven plugin versions with Spark 3.5 [#1668](https://github.com/apache/datafusion-comet/pull/1668) (hsiang-c) +- chore: Remove fallback reason "because the children were not native" [#1672](https://github.com/apache/datafusion-comet/pull/1672) (andygrove) +- chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` [#1688](https://github.com/apache/datafusion-comet/pull/1688) (comphead) +- chore: fix build errors [#1690](https://github.com/apache/datafusion-comet/pull/1690) (comphead) +- chore: Make Aggregate transformation more compact [#1670](https://github.com/apache/datafusion-comet/pull/1670) (EmilyMatt) +- chore: update dev/release/rat_exclude_files.txt [#1689](https://github.com/apache/datafusion-comet/pull/1689) (hsiang-c) +- chore: Move Comet rules into their own files [#1695](https://github.com/apache/datafusion-comet/pull/1695) (andygrove) +- chore: Remove fast encoding option [#1703](https://github.com/apache/datafusion-comet/pull/1703) (andygrove) +- chore: fix CI job name [#1712](https://github.com/apache/datafusion-comet/pull/1712) (hsiang-c) +- minor: Warn if memory pool is dropped with bytes still reserved [#1721](https://github.com/apache/datafusion-comet/pull/1721) (andygrove) +- chore: Correct memory acquired size in unified memory pool [#1738](https://github.com/apache/datafusion-comet/pull/1738) (zuston) +- chore: allow large errors for Clippy [#1743](https://github.com/apache/datafusion-comet/pull/1743) (comphead) +- chore: Refactor DataTypeSupport [#1741](https://github.com/apache/datafusion-comet/pull/1741) (andygrove) +- chore: More refactoring of type checking logic [#1744](https://github.com/apache/datafusion-comet/pull/1744) (andygrove) +- chore: Enable more complex type tests [#1753](https://github.com/apache/datafusion-comet/pull/1753) (andygrove) +- chore: Add `scanImpl` attribute to `CometScanExec` [#1746](https://github.com/apache/datafusion-comet/pull/1746) (andygrove) +- chore: Prepare for DataFusion 48.0.0 [#1710](https://github.com/apache/datafusion-comet/pull/1710) (andygrove) +- Docs: Setup Comet on IntelliJ [#1760](https://github.com/apache/datafusion-comet/pull/1760) (coderfender) +- chore: Reenable nested types for CometFuzzTestSuite with int96 [#1761](https://github.com/apache/datafusion-comet/pull/1761) (mbutrovich) +- chore: Enable partial Spark SQL tests for `native_iceberg_compat` scan [#1762](https://github.com/apache/datafusion-comet/pull/1762) (andygrove) +- chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet encryption tests [#1763](https://github.com/apache/datafusion-comet/pull/1763) (andygrove) +- build: Ignore array_repeat test to fix CI issues [#1774](https://github.com/apache/datafusion-comet/pull/1774) (andygrove) +- chore: Upload crash logs if Java tests fail [#1779](https://github.com/apache/datafusion-comet/pull/1779) (andygrove) +- chore: Drop support for Java 8 [#1777](https://github.com/apache/datafusion-comet/pull/1777) (andygrove) +- chore: Bump arrow to 18.3.0 [#1773](https://github.com/apache/datafusion-comet/pull/1773) (Kontinuation) +- build: Stop running Comet's Spark 4 tests on Linux for PR builds [#1802](https://github.com/apache/datafusion-comet/pull/1802) (andygrove) +- Chore: Moved strings expressions to separate file [#1792](https://github.com/apache/datafusion-comet/pull/1792) (kazantsev-maksim) +- chore: Speed up "PR Builds" CI workflows [#1807](https://github.com/apache/datafusion-comet/pull/1807) (andygrove) +- chore: [native scans] Ignore Spark SQL test for string predicate pushdown [#1768](https://github.com/apache/datafusion-comet/pull/1768) (andygrove) +- chore: Bump DataFusion to git rev 2c2f225 [#1814](https://github.com/apache/datafusion-comet/pull/1814) (andygrove) +- Feat: support bit_count function [#1602](https://github.com/apache/datafusion-comet/pull/1602) (kazantsev-maksim) +- Chore: implement bit_not as ScalarUDFImpl [#1825](https://github.com/apache/datafusion-comet/pull/1825) (kazantsev-maksim) +- build: Specify -Dsbt.log.noformat=true in sbt CI runs [#1822](https://github.com/apache/datafusion-comet/pull/1822) (andygrove) +- chore: Use unique artifact names in Java test run [#1818](https://github.com/apache/datafusion-comet/pull/1818) (andygrove) +- minor: Refactor PhysicalPlanner::default() to avoid duplicate code [#1821](https://github.com/apache/datafusion-comet/pull/1821) (andygrove) +- Chore: implement bit_count as ScalarUDFImpl [#1826](https://github.com/apache/datafusion-comet/pull/1826) (kazantsev-maksim) +- chore: IgnoreCometNativeScan on a few more Spark SQL tests [#1837](https://github.com/apache/datafusion-comet/pull/1837) (mbutrovich) +- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue #242 [#1838](https://github.com/apache/datafusion-comet/pull/1838) (rishvin) +- minor: Replace many instances of `checkSparkAnswer` with `checkSparkAnswerAndOperator` [#1851](https://github.com/apache/datafusion-comet/pull/1851) (andygrove) +- chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [#1847](https://github.com/apache/datafusion-comet/pull/1847) (andygrove) +- chore: Ignore Spark SQL WholeStageCodegenSuite tests [#1859](https://github.com/apache/datafusion-comet/pull/1859) (andygrove) +- chore: Upgrade to DataFusion 48.0.0-rc3 [#1863](https://github.com/apache/datafusion-comet/pull/1863) (andygrove) +- upgraded spark 3.5.5 to 3.5.6 [#1861](https://github.com/apache/datafusion-comet/pull/1861) (YanivKunda) +- build: Disable some rounding tests when miri is enabled [#1873](https://github.com/apache/datafusion-comet/pull/1873) (andygrove) +- chore: Enable Spark SQL tests for `native_iceberg_compat` [#1876](https://github.com/apache/datafusion-comet/pull/1876) (andygrove) +- chore: Enable more Spark SQL tests [#1869](https://github.com/apache/datafusion-comet/pull/1869) (andygrove) +- chore: refactor planner read schema tests [#1886](https://github.com/apache/datafusion-comet/pull/1886) (comphead) +- chore: Implement date_trunc as ScalarUDFImpl [#1880](https://github.com/apache/datafusion-comet/pull/1880) (leung-ming) +- Chore: implement datetime funcs as ScalarUDFImpl [#1874](https://github.com/apache/datafusion-comet/pull/1874) (trompa) +- minor: Improve testing of math scalar functions [#1896](https://github.com/apache/datafusion-comet/pull/1896) (andygrove) +- minor: Avoid rewriting join to unsupported join [#1888](https://github.com/apache/datafusion-comet/pull/1888) (andygrove) +- chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) [#1910](https://github.com/apache/datafusion-comet/pull/1910) (andygrove) +- chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes [#1905](https://github.com/apache/datafusion-comet/pull/1905) (parthchandra) +- chore: add a test case to read from an arbitrarily complex type schema [#1911](https://github.com/apache/datafusion-comet/pull/1911) (parthchandra) +- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat [#1912](https://github.com/apache/datafusion-comet/pull/1912) (kazuyukitanimura) +- build: Fix conflict between #1910 and #1912 [#1924](https://github.com/apache/datafusion-comet/pull/1924) (andygrove) +- minor: fix kube/Dockerfile build failed [#1918](https://github.com/apache/datafusion-comet/pull/1918) (zhangxffff) +- chore: Improve reporting of fallback reasons for CollectLimit [#1694](https://github.com/apache/datafusion-comet/pull/1694) (andygrove) +- chore: move udf registration to better place [#1899](https://github.com/apache/datafusion-comet/pull/1899) (rluvaton) +- chore: Comet + Iceberg (1.8.1) CI [#1715](https://github.com/apache/datafusion-comet/pull/1715) (hsiang-c) +- chore: Introduce `exprHandlers` map in QueryPlanSerde [#1903](https://github.com/apache/datafusion-comet/pull/1903) (andygrove) +- chore: Enable Spark SQL tests for auto scan mode [#1885](https://github.com/apache/datafusion-comet/pull/1885) (andygrove) +- Feat: support bit_get function [#1713](https://github.com/apache/datafusion-comet/pull/1713) (kazantsev-maksim) +- chore: Clippy fixes for Rust 1.88 [#1939](https://github.com/apache/datafusion-comet/pull/1939) (andygrove) +- Minor: Add unit tests for `ceil`/`floor` functions [#1728](https://github.com/apache/datafusion-comet/pull/1728) (tlm365) + +## Credits + +Thank you to everyone who contributed to this release. Here is a breakdown of commits (PRs merged) per contributor. + +``` + 62 Andy Grove + 16 Matt Butrovich + 10 Oleks V + 8 Parth Chandra + 5 Kazantsev Maksim + 5 hsiang-c + 4 Kristin Cowalcijk + 4 Leung Ming + 3 B Vadlamani + 3 drexler-sky + 2 Emily Matheys + 2 Huaxin Gao + 2 KAZUYUKI TANIMURA + 2 Raz Luvaton + 2 Zhen Wang + 1 Artem Kupchinskiy + 1 Junfan Zhang + 1 Qi Zhu + 1 Rishab Joshi + 1 Tai Le Manh + 1 Yaniv Kunda + 1 Zhang Xiaofeng + 1 ding-young + 1 trompa +``` + +Thank you also to everyone who contributed in other ways such as filing issues, reviewing PRs, and providing feedback on this release. + --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org For additional commands, e-mail: commits-h...@datafusion.apache.org