(datafusion-comet) branch main updated: Add changelog for 0.9.0 (#1955)

agrove Tue, 01 Jul 2025 05:39:21 -0700

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/main by this push:
     new ae7a1abb6 Add changelog for 0.9.0 (#1955)
ae7a1abb6 is described below

commit ae7a1abb663001619c13d0fd3d19accce44e6fd2
Author: Andy Grove <agr...@apache.org>
AuthorDate: Tue Jul 1 06:39:10 2025 -0600

    Add changelog for 0.9.0 (#1955)
---
 dev/changelog/0.9.0.md | 210 +++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 210 insertions(+)

diff --git a/dev/changelog/0.9.0.md b/dev/changelog/0.9.0.md
new file mode 100644
index 000000000..12fca847d
--- /dev/null
+++ b/dev/changelog/0.9.0.md
@@ -0,0 +1,210 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# DataFusion Comet 0.9.0 Changelog
+
+This release consists of 139 commits from 24 contributors. See credits at the 
end of this changelog for more information.
+
+**Fixed bugs:**
+
+- fix: typo for `instr` in fuzz testing 
[#1686](https://github.com/apache/datafusion-comet/pull/1686) (mbutrovich)
+- fix: Bucketed scan fallback for native_datafusion Parquet scan 
[#1720](https://github.com/apache/datafusion-comet/pull/1720) (mbutrovich)
+- fix: Skip row index Spark SQL tests for native_datafusion Parquet scan 
[#1724](https://github.com/apache/datafusion-comet/pull/1724) (mbutrovich)
+- fix: Check acquired memory when CometMemoryPool grows 
[#1732](https://github.com/apache/datafusion-comet/pull/1732) (wForget)
+- fix: Fix data race in memory profiling 
[#1727](https://github.com/apache/datafusion-comet/pull/1727) (andygrove)
+- fix: Enable some DPP Spark SQL tests 
[#1734](https://github.com/apache/datafusion-comet/pull/1734) (andygrove)
+- fix: support literal null list and map 
[#1742](https://github.com/apache/datafusion-comet/pull/1742) (kazuyukitanimura)
+- fix: get_struct field is incorrect when struct in array 
[#1687](https://github.com/apache/datafusion-comet/pull/1687) (comphead)
+- fix: cast map types correctly in schema adapter 
[#1771](https://github.com/apache/datafusion-comet/pull/1771) (parthchandra)
+- fix: correct schema type checking in native_iceberg_compat 
[#1755](https://github.com/apache/datafusion-comet/pull/1755) (parthchandra)
+- fix: default values for native_datafusion scan 
[#1756](https://github.com/apache/datafusion-comet/pull/1756) (mbutrovich)
+- fix: [native_scans] Support `CASE_SENSITIVE` when reading Parquet 
[#1782](https://github.com/apache/datafusion-comet/pull/1782) (andygrove)
+- fix: cargo install tpchgen-cli in benchmark doc 
[#1797](https://github.com/apache/datafusion-comet/pull/1797) (zhuqi-lucas)
+- fix: support `map_keys` 
[#1788](https://github.com/apache/datafusion-comet/pull/1788) (comphead)
+- fix: fall back on nested types for default values 
[#1799](https://github.com/apache/datafusion-comet/pull/1799) (mbutrovich)
+- fix: Re-enable Spark 4 tests on Linux 
[#1806](https://github.com/apache/datafusion-comet/pull/1806) (andygrove)
+- fix: fallback to Spark scan if encryption is enabled 
(native_datafusion/native_iceberg_compat) 
[#1785](https://github.com/apache/datafusion-comet/pull/1785) (parthchandra)
+- fix: native_iceberg_compat: move checking parquet types above fetching batch 
[#1809](https://github.com/apache/datafusion-comet/pull/1809) (mbutrovich)
+- fix: translate missing or corrupt file exceptions, fall back if asked to 
ignore [#1765](https://github.com/apache/datafusion-comet/pull/1765) 
(mbutrovich)
+- fix: Fix Spark SQL AQE exchange reuse test failures 
[#1811](https://github.com/apache/datafusion-comet/pull/1811) (coderfender)
+- fix: Enable more Spark SQL tests 
[#1834](https://github.com/apache/datafusion-comet/pull/1834) (andygrove)
+- fix: support `map_values` 
[#1835](https://github.com/apache/datafusion-comet/pull/1835) (comphead)
+- fix: Handle case where num_cols == 0 in native execution 
[#1840](https://github.com/apache/datafusion-comet/pull/1840) (andygrove)
+- fix: Fix shuffle writing rows containing null struct fields 
[#1845](https://github.com/apache/datafusion-comet/pull/1845) (Kontinuation)
+- fix: Fall back to Spark for `RANGE BETWEEN` window expressions 
[#1848](https://github.com/apache/datafusion-comet/pull/1848) (andygrove)
+- fix: Remove COMET_SHUFFLE_FALLBACK_TO_COLUMNAR hack 
[#1865](https://github.com/apache/datafusion-comet/pull/1865) (andygrove)
+- fix: support read Struct by user schema 
[#1860](https://github.com/apache/datafusion-comet/pull/1860) (comphead)
+- fix: map parquet field_id correctly (native_iceberg_compat) 
[#1815](https://github.com/apache/datafusion-comet/pull/1815) (parthchandra)
+- fix: cast_struct_to_struct aligns to Spark behavior 
[#1879](https://github.com/apache/datafusion-comet/pull/1879) (mbutrovich)
+- fix: correctly handle schemas with nested array of struct 
(native_iceberg_compat) 
[#1883](https://github.com/apache/datafusion-comet/pull/1883) (parthchandra)
+- fix: set RangePartitioning for native shuffle default to false  
[#1907](https://github.com/apache/datafusion-comet/pull/1907) (mbutrovich)
+- fix: conflict between #1905 and #1892. 
[#1919](https://github.com/apache/datafusion-comet/pull/1919) (mbutrovich)
+- fix: Add overflow check to evaluate of sum decimal accumulator 
[#1922](https://github.com/apache/datafusion-comet/pull/1922) (leung-ming)
+- fix: Fix overflow handling when casting float to decimal 
[#1914](https://github.com/apache/datafusion-comet/pull/1914) (leung-ming)
+- fix: Ignore a test case fails on Miri 
[#1951](https://github.com/apache/datafusion-comet/pull/1951) (leung-ming)
+
+**Performance related:**
+
+- perf: Add memory profiling 
[#1702](https://github.com/apache/datafusion-comet/pull/1702) (andygrove)
+- perf: Add performance tracing capability 
[#1706](https://github.com/apache/datafusion-comet/pull/1706) (andygrove)
+- perf: Add `COMET_RESPECT_PARQUET_FILTER_PUSHDOWN` config 
[#1936](https://github.com/apache/datafusion-comet/pull/1936) (andygrove)
+
+**Implemented enhancements:**
+
+- feat: add jemalloc as optional custom allocator 
[#1679](https://github.com/apache/datafusion-comet/pull/1679) (mbutrovich)
+- feat: support `array_repeat` 
[#1680](https://github.com/apache/datafusion-comet/pull/1680) (comphead)
+- feat: More warning info for users 
[#1667](https://github.com/apache/datafusion-comet/pull/1667) (hsiang-c)
+- feat: decode() expression when using 'utf-8' encoding 
[#1697](https://github.com/apache/datafusion-comet/pull/1697) (mbutrovich)
+- feat: regexp_replace() expression with no starting offset 
[#1700](https://github.com/apache/datafusion-comet/pull/1700) (mbutrovich)
+- feat: Improve performance tracing feature 
[#1730](https://github.com/apache/datafusion-comet/pull/1730) (andygrove)
+- feat: Set/cancel with job tag and make max broadcast table size configurable 
[#1693](https://github.com/apache/datafusion-comet/pull/1693) (wForget)
+- feat: Add support for `expm1` expression from `datafusion-spark` crate 
[#1711](https://github.com/apache/datafusion-comet/pull/1711) (andygrove)
+- feat: Add config option for showing all Comet plan transformations 
[#1780](https://github.com/apache/datafusion-comet/pull/1780) (andygrove)
+- feat: Support Type widening: byte → short/int/long, short → int/long 
[#1770](https://github.com/apache/datafusion-comet/pull/1770) (huaxingao)
+- feat: Translate Hadoop S3A configurations to object_store configurations 
[#1817](https://github.com/apache/datafusion-comet/pull/1817) (Kontinuation)
+- feat: Upgrade to official DataFusion 48.0.0 release 
[#1877](https://github.com/apache/datafusion-comet/pull/1877) (andygrove)
+- feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` 
[#1747](https://github.com/apache/datafusion-comet/pull/1747) (andygrove)
+- feat: support RangePartitioning with native shuffle 
[#1862](https://github.com/apache/datafusion-comet/pull/1862) (mbutrovich)
+- feat: Add support for signum expression 
[#1889](https://github.com/apache/datafusion-comet/pull/1889) (andygrove)
+- feat: Add support to lookup map by key 
[#1898](https://github.com/apache/datafusion-comet/pull/1898) (comphead)
+- feat: support array_max 
[#1892](https://github.com/apache/datafusion-comet/pull/1892) (drexler-sky)
+- feat: pass ignore_nulls flag to first and last 
[#1866](https://github.com/apache/datafusion-comet/pull/1866) (rluvaton)
+- feat: Implement ToPrettyString 
[#1921](https://github.com/apache/datafusion-comet/pull/1921) (andygrove)
+- feat: Support hadoop s3a config in native_iceberg_compat 
[#1925](https://github.com/apache/datafusion-comet/pull/1925) (parthchandra)
+- feat: rand expression support 
[#1199](https://github.com/apache/datafusion-comet/pull/1199) (akupchinskiy)
+- feat: supports array_distinct 
[#1923](https://github.com/apache/datafusion-comet/pull/1923) (drexler-sky)
+- feat: `auto` scan mode should check for supported file location 
[#1930](https://github.com/apache/datafusion-comet/pull/1930) (andygrove)
+- feat: Encapsulate Parquet objects 
[#1920](https://github.com/apache/datafusion-comet/pull/1920) (huaxingao)
+- feat: Change default value of `COMET_NATIVE_SCAN_IMPL` to `auto` 
[#1933](https://github.com/apache/datafusion-comet/pull/1933) (andygrove)
+- feat: Supports array_union 
[#1945](https://github.com/apache/datafusion-comet/pull/1945) (drexler-sky)
+
+**Documentation updates:**
+
+- docs: Add changelog for 0.8.0 
[#1675](https://github.com/apache/datafusion-comet/pull/1675) (andygrove)
+- docs: Add instructions on running TPC-H on macOS 
[#1647](https://github.com/apache/datafusion-comet/pull/1647) (andygrove)
+- docs: Add documentation for accelerating Iceberg Parquet scans with Comet 
[#1683](https://github.com/apache/datafusion-comet/pull/1683) (andygrove)
+- docs: Add note on setting `core.abbrev` when generating diffs 
[#1735](https://github.com/apache/datafusion-comet/pull/1735) (andygrove)
+- docs: Remove outdated param in macos bench guide 
[#1748](https://github.com/apache/datafusion-comet/pull/1748) (ding-young)
+- docs: Add instructions for running individual Spark SQL tests from sbt 
[#1752](https://github.com/apache/datafusion-comet/pull/1752) (coderfender)
+- docs: Add documentation for native_datafusion Parquet scanner's S3 support 
[#1832](https://github.com/apache/datafusion-comet/pull/1832) (Kontinuation)
+- docs: Add docs stating that Comet does not support reading decimals encoded 
in Parquet BINARY format 
[#1895](https://github.com/apache/datafusion-comet/pull/1895) (andygrove)
+
+**Other:**
+
+- chore: Start 0.9.0 development 
[#1676](https://github.com/apache/datafusion-comet/pull/1676) (andygrove)
+- chore: Update viable crates 
[#1677](https://github.com/apache/datafusion-comet/pull/1677) (EmilyMatt)
+- chore: match Maven plugin versions with Spark 3.5 
[#1668](https://github.com/apache/datafusion-comet/pull/1668) (hsiang-c)
+- chore: Remove fallback reason "because the children were not native" 
[#1672](https://github.com/apache/datafusion-comet/pull/1672) (andygrove)
+- chore: Rename `scalarExprToProto` to `scalarFunctionExprToProto` 
[#1688](https://github.com/apache/datafusion-comet/pull/1688) (comphead)
+- chore: fix build errors 
[#1690](https://github.com/apache/datafusion-comet/pull/1690) (comphead)
+- chore: Make Aggregate transformation more compact 
[#1670](https://github.com/apache/datafusion-comet/pull/1670) (EmilyMatt)
+- chore: update dev/release/rat_exclude_files.txt 
[#1689](https://github.com/apache/datafusion-comet/pull/1689) (hsiang-c)
+- chore: Move Comet rules into their own files 
[#1695](https://github.com/apache/datafusion-comet/pull/1695) (andygrove)
+- chore: Remove fast encoding option 
[#1703](https://github.com/apache/datafusion-comet/pull/1703) (andygrove)
+- chore: fix CI job name 
[#1712](https://github.com/apache/datafusion-comet/pull/1712) (hsiang-c)
+- minor: Warn if memory pool is dropped with bytes still reserved 
[#1721](https://github.com/apache/datafusion-comet/pull/1721) (andygrove)
+- chore: Correct memory acquired size in unified memory pool 
[#1738](https://github.com/apache/datafusion-comet/pull/1738) (zuston)
+- chore: allow large errors for Clippy 
[#1743](https://github.com/apache/datafusion-comet/pull/1743) (comphead)
+- chore: Refactor DataTypeSupport 
[#1741](https://github.com/apache/datafusion-comet/pull/1741) (andygrove)
+- chore: More refactoring of type checking logic 
[#1744](https://github.com/apache/datafusion-comet/pull/1744) (andygrove)
+- chore: Enable more complex type tests 
[#1753](https://github.com/apache/datafusion-comet/pull/1753) (andygrove)
+- chore: Add `scanImpl` attribute to `CometScanExec` 
[#1746](https://github.com/apache/datafusion-comet/pull/1746) (andygrove)
+- chore: Prepare for DataFusion 48.0.0 
[#1710](https://github.com/apache/datafusion-comet/pull/1710) (andygrove)
+- Docs: Setup Comet on IntelliJ  
[#1760](https://github.com/apache/datafusion-comet/pull/1760) (coderfender)
+- chore: Reenable nested types for CometFuzzTestSuite with int96 
[#1761](https://github.com/apache/datafusion-comet/pull/1761) (mbutrovich)
+- chore: Enable partial Spark SQL tests for `native_iceberg_compat` scan 
[#1762](https://github.com/apache/datafusion-comet/pull/1762) (andygrove)
+- chore: [native_iceberg_compat / native_datafusion] Ignore Spark SQL Parquet 
encryption tests [#1763](https://github.com/apache/datafusion-comet/pull/1763) 
(andygrove)
+- build: Ignore array_repeat test to fix CI issues 
[#1774](https://github.com/apache/datafusion-comet/pull/1774) (andygrove)
+- chore: Upload crash logs if Java tests fail 
[#1779](https://github.com/apache/datafusion-comet/pull/1779) (andygrove)
+- chore: Drop support for Java 8 
[#1777](https://github.com/apache/datafusion-comet/pull/1777) (andygrove)
+- chore: Bump arrow to 18.3.0 
[#1773](https://github.com/apache/datafusion-comet/pull/1773) (Kontinuation)
+- build: Stop running Comet's Spark 4 tests on Linux for PR builds 
[#1802](https://github.com/apache/datafusion-comet/pull/1802) (andygrove)
+- Chore: Moved strings expressions to separate file 
[#1792](https://github.com/apache/datafusion-comet/pull/1792) (kazantsev-maksim)
+- chore: Speed up "PR Builds" CI workflows 
[#1807](https://github.com/apache/datafusion-comet/pull/1807) (andygrove)
+- chore: [native scans] Ignore Spark SQL test for string predicate pushdown 
[#1768](https://github.com/apache/datafusion-comet/pull/1768) (andygrove)
+- chore: Bump DataFusion to git rev 2c2f225 
[#1814](https://github.com/apache/datafusion-comet/pull/1814) (andygrove)
+- Feat: support bit_count function 
[#1602](https://github.com/apache/datafusion-comet/pull/1602) (kazantsev-maksim)
+- Chore: implement bit_not as ScalarUDFImpl 
[#1825](https://github.com/apache/datafusion-comet/pull/1825) (kazantsev-maksim)
+- build: Specify -Dsbt.log.noformat=true in sbt CI runs 
[#1822](https://github.com/apache/datafusion-comet/pull/1822) (andygrove)
+- chore: Use unique artifact names in Java test run 
[#1818](https://github.com/apache/datafusion-comet/pull/1818) (andygrove)
+- minor: Refactor PhysicalPlanner::default() to avoid duplicate code 
[#1821](https://github.com/apache/datafusion-comet/pull/1821) (andygrove)
+- Chore: implement bit_count as ScalarUDFImpl 
[#1826](https://github.com/apache/datafusion-comet/pull/1826) (kazantsev-maksim)
+- chore: IgnoreCometNativeScan on a few more Spark SQL tests 
[#1837](https://github.com/apache/datafusion-comet/pull/1837) (mbutrovich)
+- chore: Enable tests in RemoveRedundantProjectsSuite.scala related to issue 
#242 [#1838](https://github.com/apache/datafusion-comet/pull/1838) (rishvin)
+- minor: Replace many instances of `checkSparkAnswer` with 
`checkSparkAnswerAndOperator` 
[#1851](https://github.com/apache/datafusion-comet/pull/1851) (andygrove)
+- chore: Update documentation and ignore Spark SQL tests for known issue with 
count distinct on NaN in aggregate 
[#1847](https://github.com/apache/datafusion-comet/pull/1847) (andygrove)
+- chore: Ignore Spark SQL WholeStageCodegenSuite tests 
[#1859](https://github.com/apache/datafusion-comet/pull/1859) (andygrove)
+- chore: Upgrade to DataFusion 48.0.0-rc3 
[#1863](https://github.com/apache/datafusion-comet/pull/1863) (andygrove)
+- upgraded spark 3.5.5 to 3.5.6 
[#1861](https://github.com/apache/datafusion-comet/pull/1861) (YanivKunda)
+- build: Disable some rounding tests when miri is enabled 
[#1873](https://github.com/apache/datafusion-comet/pull/1873) (andygrove)
+- chore: Enable Spark SQL tests for `native_iceberg_compat` 
[#1876](https://github.com/apache/datafusion-comet/pull/1876) (andygrove)
+- chore: Enable more Spark SQL tests 
[#1869](https://github.com/apache/datafusion-comet/pull/1869) (andygrove)
+- chore: refactor planner read schema tests 
[#1886](https://github.com/apache/datafusion-comet/pull/1886) (comphead)
+- chore: Implement date_trunc as ScalarUDFImpl 
[#1880](https://github.com/apache/datafusion-comet/pull/1880) (leung-ming)
+- Chore: implement datetime funcs as ScalarUDFImpl 
[#1874](https://github.com/apache/datafusion-comet/pull/1874) (trompa)
+- minor: Improve testing of math scalar functions 
[#1896](https://github.com/apache/datafusion-comet/pull/1896) (andygrove)
+- minor: Avoid rewriting join to unsupported join 
[#1888](https://github.com/apache/datafusion-comet/pull/1888) (andygrove)
+- chore: Enable `native_iceberg_compat` Spark SQL tests (for real, this time) 
[#1910](https://github.com/apache/datafusion-comet/pull/1910) (andygrove)
+- chore: rename makeParquetFileAllTypes to makeParquetFileAllPrimitiveTypes 
[#1905](https://github.com/apache/datafusion-comet/pull/1905) (parthchandra)
+- chore: add a test case to read from an arbitrarily complex type schema 
[#1911](https://github.com/apache/datafusion-comet/pull/1911) (parthchandra)
+- test: Trigger Spark 3.4.3 SQL tests for iceberg-compat 
[#1912](https://github.com/apache/datafusion-comet/pull/1912) (kazuyukitanimura)
+- build: Fix conflict between #1910 and #1912 
[#1924](https://github.com/apache/datafusion-comet/pull/1924) (andygrove)
+- minor: fix kube/Dockerfile build failed 
[#1918](https://github.com/apache/datafusion-comet/pull/1918) (zhangxffff)
+- chore: Improve reporting of fallback reasons for CollectLimit 
[#1694](https://github.com/apache/datafusion-comet/pull/1694) (andygrove)
+- chore: move udf registration to better place 
[#1899](https://github.com/apache/datafusion-comet/pull/1899) (rluvaton)
+- chore: Comet + Iceberg (1.8.1) CI 
[#1715](https://github.com/apache/datafusion-comet/pull/1715) (hsiang-c)
+- chore: Introduce `exprHandlers` map in QueryPlanSerde 
[#1903](https://github.com/apache/datafusion-comet/pull/1903) (andygrove)
+- chore: Enable Spark SQL tests for auto scan mode 
[#1885](https://github.com/apache/datafusion-comet/pull/1885) (andygrove)
+- Feat: support bit_get function 
[#1713](https://github.com/apache/datafusion-comet/pull/1713) (kazantsev-maksim)
+- chore: Clippy fixes for Rust 1.88 
[#1939](https://github.com/apache/datafusion-comet/pull/1939) (andygrove)
+- Minor: Add unit tests for `ceil`/`floor` functions 
[#1728](https://github.com/apache/datafusion-comet/pull/1728) (tlm365)
+
+## Credits
+
+Thank you to everyone who contributed to this release. Here is a breakdown of 
commits (PRs merged) per contributor.
+
+```
+    62 Andy Grove
+    16 Matt Butrovich
+    10 Oleks V
+     8 Parth Chandra
+     5 Kazantsev Maksim
+     5 hsiang-c
+     4 Kristin Cowalcijk
+     4 Leung Ming
+     3 B Vadlamani
+     3 drexler-sky
+     2 Emily Matheys
+     2 Huaxin Gao
+     2 KAZUYUKI TANIMURA
+     2 Raz Luvaton
+     2 Zhen Wang
+     1 Artem Kupchinskiy
+     1 Junfan Zhang
+     1 Qi Zhu
+     1 Rishab Joshi
+     1 Tai Le Manh
+     1 Yaniv Kunda
+     1 Zhang Xiaofeng
+     1 ding-young
+     1 trompa
+```
+
+Thank you also to everyone who contributed in other ways such as filing 
issues, reviewing PRs, and providing feedback on this release.
+


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-comet) branch main updated: Add changelog for 0.9.0 (#1955)

Reply via email to