(datafusion-comet) branch comet-parquet-exec updated: chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 20240121 (#1316)

agrove Tue, 21 Jan 2025 11:46:31 -0800

This is an automated email from the ASF dual-hosted git repository.

agrove pushed a commit to branch comet-parquet-exec
in repository https://gitbox.apache.org/repos/asf/datafusion-comet.git



The following commit(s) were added to refs/heads/comet-parquet-exec by this 
push:
     new 85fe884a6 chore: Merge remote-tracking branch 'apache/main' into 
comet-parquet-exec - 20240121 (#1316)
85fe884a6 is described below

commit 85fe884a6880c3651f18225ae363b9314377373e
Author: Parth Chandra <par...@apache.org>
AuthorDate: Tue Jan 21 11:46:22 2025 -0800

    chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 
20240121 (#1316)
    
    * feat: support array_append (#1072)
    
    * feat: support array_append
    
    * formatted code
    
    * rewrite array_append plan to match spark behaviour and fixed bug in 
QueryPlan serde
    
    * remove unwrap
    
    * Fix for Spark 3.3
    
    * refactor array_append binary expression serde code
    
    * Disabled array_append test for spark 4.0+
    
    * chore: Simplify CometShuffleMemoryAllocator to use Spark unified memory 
allocator (#1063)
    
    * docs: Update benchmarking.md (#1085)
    
    * feat: Require offHeap memory to be enabled (always use unified memory) 
(#1062)
    
    * Require offHeap memory
    
    * remove unused import
    
    * use off heap memory in stability tests
    
    * reorder imports
    
    * test: Restore one test in CometExecSuite by adding COMET_SHUFFLE_MODE 
config (#1087)
    
    * Add changelog for 0.4.0 (#1089)
    
    * chore: Prepare for 0.5.0 development (#1090)
    
    * Update version number for build
    
    * update docs
    
    * build: Skip installation of spark-integration  and fuzz testing modules 
(#1091)
    
    * Add hint for finding the GPG key to use when publishing to maven (#1093)
    
    * docs: Update documentation for 0.4.0 release (#1096)
    
    * update TPC-H results
    
    * update Maven links
    
    * update benchmarking guide and add TPC-DS results
    
    * include q72
    
    * fix: Unsigned type related bugs (#1095)
    
    ## Which issue does this PR close?
    
    Closes https://github.com/apache/datafusion-comet/issues/1067
    
    ## Rationale for this change
    
    Bug fix. A few expressions were failing some unsigned type related tests
    
    ## What changes are included in this PR?
    
     - For `u8`/`u16`, switched to use `generate_cast_to_signed!` in order to 
copy full i16/i32 width instead of padding zeros in the higher bits
     - `u64` becomes `Decimal(20, 0)` but there was a bug in `round()`  (`>` vs 
`>=`)
    
    ## How are these changes tested?
    
    Put back tests for unsigned types
    
    * chore: Include first ScanExec batch in metrics (#1105)
    
    * include first batch in ScanExec metrics
    
    * record row count metric
    
    * fix regression
    
    * chore: Improve CometScan metrics (#1100)
    
    * Add native metrics for plan creation
    
    * make messages consistent
    
    * Include get_next_batch cost in metrics
    
    * formatting
    
    * fix double count of rows
    
    * chore: Add custom metric for native shuffle fetching batches from JVM 
(#1108)
    
    * feat: support array_insert (#1073)
    
    * Part of the implementation of array_insert
    
    * Missing methods
    
    * Working version
    
    * Reformat code
    
    * Fix code-style
    
    * Add comments about spark's implementation.
    
    * Implement negative indices
    
    + fix tests for spark < 3.4
    
    * Fix code-style
    
    * Fix scalastyle
    
    * Fix tests for spark < 3.4
    
    * Fixes & tests
    
    - added test for the negative index
    - added test for the legacy spark mode
    
    * Use assume(isSpark34Plus) in tests
    
    * Test else-branch & improve coverage
    
    * Update native/spark-expr/src/list.rs
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * Fix fallback test
    
    In one case there is a zero in index and test fails due to spark error
    
    * Adjust the behaviour for the NULL case to Spark
    
    * Move the logic of type checking to the method
    
    * Fix code-style
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * feat: enable decimal to decimal cast of different precision and scale 
(#1086)
    
    * enable decimal to decimal cast of different precision and scale
    
    * add more test cases for negative scale and higher precision
    
    * add check for compatibility for decimal to decimal
    
    * fix code style
    
    * Update spark/src/main/scala/org/apache/comet/expressions/CometCast.scala
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * fix the nit in comment
    
    ---------
    
    Co-authored-by: himadripal <h...@apple.com>
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * docs: fix readme FGPA/FPGA typo (#1117)
    
    * fix: Use RDD partition index (#1112)
    
    * fix: Use RDD partition index
    
    * fix
    
    * fix
    
    * fix
    
    * fix: Various metrics bug fixes and improvements (#1111)
    
    * fix: Don't create CometScanExec for subclasses of ParquetFileFormat 
(#1129)
    
    * Use exact class comparison for parquet scan
    
    * Add test
    
    * Add comment
    
    * fix: Fix metrics regressions (#1132)
    
    * fix metrics issues
    
    * clippy
    
    * update tests
    
    * docs: Add more technical detail and new diagram to Comet plugin overview 
(#1119)
    
    * Add more technical detail and new diagram to Comet plugin overview
    
    * update diagram
    
    * add info on Arrow IPC
    
    * update diagram
    
    * update diagram
    
    * update docs
    
    * address feedback
    
    * Stop passing Java config map into native createPlan (#1101)
    
    * feat: Improve ScanExec native metrics (#1133)
    
    * save
    
    * remove shuffle jvm metric and update tuning guide
    
    * docs
    
    * add source for all ScanExecs
    
    * address feedback
    
    * address feedback
    
    * chore: Remove unused StringView struct (#1143)
    
    * Remove unused StringView struct
    
    * remove more dead code
    
    * docs: Add some documentation explaining how shuffle works (#1148)
    
    * add some notes on shuffle
    
    * reads
    
    * improve docs
    
    * test: enable more Spark 4.0 tests (#1145)
    
    ## Which issue does this PR close?
    
    Part of https://github.com/apache/datafusion-comet/issues/372 and 
https://github.com/apache/datafusion-comet/issues/551
    
    ## Rationale for this change
    
    To be ready for Spark 4.0
    
    ## What changes are included in this PR?
    
    This PR enables more Spark 4.0 tests that were fixed by recent changes
    
    ## How are these changes tested?
    
    tests enabled
    
    * chore: Refactor cast to use SparkCastOptions param (#1146)
    
    * Refactor cast to use SparkCastOptions param
    
    * update tests
    
    * update benches
    
    * update benches
    
    * update benches
    
    * Enable more scenarios in CometExecBenchmark. (#1151)
    
    * chore: Move more expressions from core crate to spark-expr crate (#1152)
    
    * move aggregate expressions to spark-expr crate
    
    * move more expressions
    
    * move benchmark
    
    * normalize_nan
    
    * bitwise not
    
    * comet scalar funcs
    
    * update bench imports
    
    * remove dead code (#1155)
    
    * fix: Spark 4.0-preview1 SPARK-47120 (#1156)
    
    ## Which issue does this PR close?
    
    Part of https://github.com/apache/datafusion-comet/issues/372 and 
https://github.com/apache/datafusion-comet/issues/551
    
    ## Rationale for this change
    
    To be ready for Spark 4.0
    
    ## What changes are included in this PR?
    
    This PR fixes the new test SPARK-47120 added in Spark 4.0
    
    ## How are these changes tested?
    
    tests enabled
    
    * chore: Move string kernels and expressions to spark-expr crate (#1164)
    
    * Move string kernels and expressions to spark-expr crate
    
    * remove unused hash kernel
    
    * remove unused dependencies
    
    * chore: Move remaining expressions to spark-expr crate + some minor 
refactoring (#1165)
    
    * move CheckOverflow to spark-expr crate
    
    * move NegativeExpr to spark-expr crate
    
    * move UnboundColumn to spark-expr crate
    
    * move ExpandExec from execution::datafusion::operators to 
execution::operators
    
    * refactoring to remove datafusion subpackage
    
    * update imports in benches
    
    * fix
    
    * fix
    
    * chore: Add ignored tests for reading complex types from Parquet (#1167)
    
    * Add ignored tests for reading structs from Parquet
    
    * add basic map test
    
    * add tests for Map and Array
    
    * feat: Add Spark-compatible implementation of SchemaAdapterFactory (#1169)
    
    * Add Spark-compatible SchemaAdapterFactory implementation
    
    * remove prototype code
    
    * fix
    
    * refactor
    
    * implement more cast logic
    
    * implement more cast logic
    
    * add basic test
    
    * improve test
    
    * cleanup
    
    * fmt
    
    * add support for casting unsigned int to signed int
    
    * clippy
    
    * address feedback
    
    * fix test
    
    * fix: Document enabling comet explain plan usage in Spark (4.0) (#1176)
    
    * test: enabling Spark tests with offHeap requirement (#1177)
    
    ## Which issue does this PR close?
    
    ## Rationale for this change
    
    After https://github.com/apache/datafusion-comet/pull/1062 We have not 
running Spark tests for native execution
    
    ## What changes are included in this PR?
    
    Removed the off heap requirement for testing
    
    ## How are these changes tested?
    
    Bringing back Spark tests for native execution
    
    * feat: Improve shuffle metrics (second attempt) (#1175)
    
    * improve shuffle metrics
    
    * docs
    
    * more metrics
    
    * refactor
    
    * address feedback
    
    * fix: stddev_pop should not directly return 0.0 when count is 1.0 (#1184)
    
    * add test
    
    * fix
    
    * fix
    
    * fix
    
    * feat: Make native shuffle compression configurable and respect 
`spark.shuffle.compress` (#1185)
    
    * Make shuffle compression codec and level configurable
    
    * remove lz4 references
    
    * docs
    
    * update comment
    
    * clippy
    
    * fix benches
    
    * clippy
    
    * clippy
    
    * disable test for miri
    
    * remove lz4 reference from proto
    
    * minor: move shuffle classes from common to spark (#1193)
    
    * minor: refactor decodeBatches to make private in broadcast exchange 
(#1195)
    
    * minor: refactor prepare_output so that it does not require an 
ExecutionContext (#1194)
    
    * fix: fix missing explanation for then branch in case when (#1200)
    
    * minor: remove unused source files (#1202)
    
    * chore: Upgrade to DataFusion 44.0.0-rc2 (#1154)
    
    * move aggregate expressions to spark-expr crate
    
    * move more expressions
    
    * move benchmark
    
    * normalize_nan
    
    * bitwise not
    
    * comet scalar funcs
    
    * update bench imports
    
    * save
    
    * save
    
    * save
    
    * remove unused imports
    
    * clippy
    
    * implement more hashers
    
    * implement Hash and PartialEq
    
    * implement Hash and PartialEq
    
    * implement Hash and PartialEq
    
    * benches
    
    * fix ScalarUDFImpl.return_type failure
    
    * exclude test from miri
    
    * ignore correct test
    
    * ignore another test
    
    * remove miri checks
    
    * use return_type_from_exprs
    
    * Revert "use return_type_from_exprs"
    
    This reverts commit febc1f1ec1301f9b359fc23ad6a117224fce35b7.
    
    * use DF main branch
    
    * hacky workaround for regression in ScalarUDFImpl.return_type
    
    * fix repo url
    
    * pin to revision
    
    * bump to latest rev
    
    * bump to latest DF rev
    
    * bump DF to rev 9f530dd
    
    * add Cargo.lock
    
    * bump DF version
    
    * no default features
    
    * Revert "remove miri checks"
    
    This reverts commit 4638fe3aa5501966cd5d8b53acf26c698b10b3c9.
    
    * Update pin to DataFusion e99e02b9b9093ceb0c13a2dd32a2a89beba47930
    
    * update pin
    
    * Update Cargo.toml
    
    Bump to 44.0.0-rc2
    
    * update cargo lock
    
    * revert miri change
    
    ---------
    
    Co-authored-by: Andrew Lamb <and...@nerdnetworks.org>
    
    * feat: add support for array_contains expression (#1163)
    
    * feat: add support for array_contains expression
    
    * test: add unit test for array_contains function
    
    * Removes unnecessary case expression for handling null values
    
    * chore: Move more expressions from core crate to spark-expr crate (#1152)
    
    * move aggregate expressions to spark-expr crate
    
    * move more expressions
    
    * move benchmark
    
    * normalize_nan
    
    * bitwise not
    
    * comet scalar funcs
    
    * update bench imports
    
    * remove dead code (#1155)
    
    * fix: Spark 4.0-preview1 SPARK-47120 (#1156)
    
    ## Which issue does this PR close?
    
    Part of https://github.com/apache/datafusion-comet/issues/372 and 
https://github.com/apache/datafusion-comet/issues/551
    
    ## Rationale for this change
    
    To be ready for Spark 4.0
    
    ## What changes are included in this PR?
    
    This PR fixes the new test SPARK-47120 added in Spark 4.0
    
    ## How are these changes tested?
    
    tests enabled
    
    * chore: Move string kernels and expressions to spark-expr crate (#1164)
    
    * Move string kernels and expressions to spark-expr crate
    
    * remove unused hash kernel
    
    * remove unused dependencies
    
    * chore: Move remaining expressions to spark-expr crate + some minor 
refactoring (#1165)
    
    * move CheckOverflow to spark-expr crate
    
    * move NegativeExpr to spark-expr crate
    
    * move UnboundColumn to spark-expr crate
    
    * move ExpandExec from execution::datafusion::operators to 
execution::operators
    
    * refactoring to remove datafusion subpackage
    
    * update imports in benches
    
    * fix
    
    * fix
    
    * chore: Add ignored tests for reading complex types from Parquet (#1167)
    
    * Add ignored tests for reading structs from Parquet
    
    * add basic map test
    
    * add tests for Map and Array
    
    * feat: Add Spark-compatible implementation of SchemaAdapterFactory (#1169)
    
    * Add Spark-compatible SchemaAdapterFactory implementation
    
    * remove prototype code
    
    * fix
    
    * refactor
    
    * implement more cast logic
    
    * implement more cast logic
    
    * add basic test
    
    * improve test
    
    * cleanup
    
    * fmt
    
    * add support for casting unsigned int to signed int
    
    * clippy
    
    * address feedback
    
    * fix test
    
    * fix: Document enabling comet explain plan usage in Spark (4.0) (#1176)
    
    * test: enabling Spark tests with offHeap requirement (#1177)
    
    ## Which issue does this PR close?
    
    ## Rationale for this change
    
    After https://github.com/apache/datafusion-comet/pull/1062 We have not 
running Spark tests for native execution
    
    ## What changes are included in this PR?
    
    Removed the off heap requirement for testing
    
    ## How are these changes tested?
    
    Bringing back Spark tests for native execution
    
    * feat: Improve shuffle metrics (second attempt) (#1175)
    
    * improve shuffle metrics
    
    * docs
    
    * more metrics
    
    * refactor
    
    * address feedback
    
    * fix: stddev_pop should not directly return 0.0 when count is 1.0 (#1184)
    
    * add test
    
    * fix
    
    * fix
    
    * fix
    
    * feat: Make native shuffle compression configurable and respect 
`spark.shuffle.compress` (#1185)
    
    * Make shuffle compression codec and level configurable
    
    * remove lz4 references
    
    * docs
    
    * update comment
    
    * clippy
    
    * fix benches
    
    * clippy
    
    * clippy
    
    * disable test for miri
    
    * remove lz4 reference from proto
    
    * minor: move shuffle classes from common to spark (#1193)
    
    * minor: refactor decodeBatches to make private in broadcast exchange 
(#1195)
    
    * minor: refactor prepare_output so that it does not require an 
ExecutionContext (#1194)
    
    * fix: fix missing explanation for then branch in case when (#1200)
    
    * minor: remove unused source files (#1202)
    
    * chore: Upgrade to DataFusion 44.0.0-rc2 (#1154)
    
    * move aggregate expressions to spark-expr crate
    
    * move more expressions
    
    * move benchmark
    
    * normalize_nan
    
    * bitwise not
    
    * comet scalar funcs
    
    * update bench imports
    
    * save
    
    * save
    
    * save
    
    * remove unused imports
    
    * clippy
    
    * implement more hashers
    
    * implement Hash and PartialEq
    
    * implement Hash and PartialEq
    
    * implement Hash and PartialEq
    
    * benches
    
    * fix ScalarUDFImpl.return_type failure
    
    * exclude test from miri
    
    * ignore correct test
    
    * ignore another test
    
    * remove miri checks
    
    * use return_type_from_exprs
    
    * Revert "use return_type_from_exprs"
    
    This reverts commit febc1f1ec1301f9b359fc23ad6a117224fce35b7.
    
    * use DF main branch
    
    * hacky workaround for regression in ScalarUDFImpl.return_type
    
    * fix repo url
    
    * pin to revision
    
    * bump to latest rev
    
    * bump to latest DF rev
    
    * bump DF to rev 9f530dd
    
    * add Cargo.lock
    
    * bump DF version
    
    * no default features
    
    * Revert "remove miri checks"
    
    This reverts commit 4638fe3aa5501966cd5d8b53acf26c698b10b3c9.
    
    * Update pin to DataFusion e99e02b9b9093ceb0c13a2dd32a2a89beba47930
    
    * update pin
    
    * Update Cargo.toml
    
    Bump to 44.0.0-rc2
    
    * update cargo lock
    
    * revert miri change
    
    ---------
    
    Co-authored-by: Andrew Lamb <and...@nerdnetworks.org>
    
    * update UT
    
    Signed-off-by: Dharan Aditya <dharan.adi...@gmail.com>
    
    * fix typo in UT
    
    Signed-off-by: Dharan Aditya <dharan.adi...@gmail.com>
    
    ---------
    
    Signed-off-by: Dharan Aditya <dharan.adi...@gmail.com>
    Co-authored-by: Andy Grove <agr...@apache.org>
    Co-authored-by: KAZUYUKI TANIMURA <ktanim...@apple.com>
    Co-authored-by: Parth Chandra <par...@apache.org>
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Co-authored-by: Raz Luvaton <16746759+rluva...@users.noreply.github.com>
    Co-authored-by: Andrew Lamb <and...@nerdnetworks.org>
    
    * feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting 
with various datafusion memory pool setups. (#1021)
    
    * feat: Reenable tests for filtered SMJ anti join (#1211)
    
    * feat: reenable filtered SMJ Anti join tests
    
    * feat: reenable filtered SMJ Anti join tests
    
    * feat: reenable filtered SMJ Anti join tests
    
    * feat: reenable filtered SMJ Anti join tests
    
    * Add CoalesceBatchesExec around SMJ with join filter
    
    * adding `CoalesceBatches`
    
    * adding `CoalesceBatches`
    
    * adding `CoalesceBatches`
    
    * feat: reenable filtered SMJ Anti join tests
    
    * feat: reenable filtered SMJ Anti join tests
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * chore: Add safety check to CometBuffer (#1050)
    
    * chore: Add safety check to CometBuffer
    
    * Add CometColumnarToRowExec
    
    * fix
    
    * fix
    
    * more
    
    * Update plan stability results
    
    * fix
    
    * fix
    
    * fix
    
    * Revert "fix"
    
    This reverts commit 9bad173c7751f105bf3ded2ebc2fed0737d1b909.
    
    * Revert "Revert "fix""
    
    This reverts commit d527ad1a365d3aff64200ceba6d11cf376f3919f.
    
    * fix BucketedReadWithoutHiveSupportSuite
    
    * fix SparkPlanSuite
    
    * remove unreachable code (#1213)
    
    * test: Enable Comet by default except some tests in 
SparkSessionExtensionSuite (#1201)
    
    ## Which issue does this PR close?
    
    Part of https://github.com/apache/datafusion-comet/issues/1197
    
    ## Rationale for this change
    
    Since `loadCometExtension` in the diffs were not using `isCometEnabled`, 
`SparkSessionExtensionSuite` was not using Comet. Once enabled, some test 
failures discovered
    
    ## What changes are included in this PR?
    
    `loadCometExtension` now uses `isCometEnabled` that enables Comet by default
    Temporary ignore the failing tests in SparkSessionExtensionSuite
    
    ## How are these changes tested?
    
    existing tests
    
    * extract struct expressions to folders based on spark grouping (#1216)
    
    * chore: extract static invoke expressions to folders based on spark 
grouping (#1217)
    
    * extract static invoke expressions to folders based on spark grouping
    
    * Update native/spark-expr/src/static_invoke/mod.rs
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * chore: Follow-on PR to fully enable onheap memory usage (#1210)
    
    * Make datafusion's native memory pool configurable
    
    * save
    
    * fix
    
    * Update memory calculation and add draft documentation
    
    * ready for review
    
    * ready for review
    
    * address feedback
    
    * Update docs/source/user-guide/tuning.md
    
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * Update docs/source/user-guide/tuning.md
    
    Co-authored-by: Kristin Cowalcijk <b...@wherobots.com>
    
    * Update docs/source/user-guide/tuning.md
    
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * Update docs/source/user-guide/tuning.md
    
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * remove unused config
    
    ---------
    
    Co-authored-by: Kristin Cowalcijk <b...@wherobots.com>
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * feat: Move shuffle block decompression and decoding to native code and 
add LZ4 & Snappy support (#1192)
    
    * Implement native decoding and decompression
    
    * revert some variable renaming for smaller diff
    
    * fix oom issues?
    
    * make NativeBatchDecoderIterator more consistent with ArrowReaderIterator
    
    * fix oom and prep for review
    
    * format
    
    * Add LZ4 support
    
    * clippy, new benchmark
    
    * rename metrics, clean up lz4 code
    
    * update test
    
    * Add support for snappy
    
    * format
    
    * change default back to lz4
    
    * make metrics more accurate
    
    * format
    
    * clippy
    
    * use faster unsafe version of lz4_flex
    
    * Make compression codec configurable for columnar shuffle
    
    * clippy
    
    * fix bench
    
    * fmt
    
    * address feedback
    
    * address feedback
    
    * address feedback
    
    * minor code simplification
    
    * cargo fmt
    
    * overflow check
    
    * rename compression level config
    
    * address feedback
    
    * address feedback
    
    * rename constant
    
    * chore: extract agg_funcs expressions to folders based on spark grouping 
(#1224)
    
    * extract agg_funcs expressions to folders based on spark grouping
    
    * fix rebase
    
    * extract datetime_funcs expressions to folders based on spark grouping 
(#1222)
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * chore: use datafusion from crates.io (#1232)
    
    * chore: extract strings file to `strings_func` like in spark grouping 
(#1215)
    
    * chore: extract predicate_functions expressions to folders based on spark 
grouping (#1218)
    
    * extract predicate_functions expressions to folders based on spark grouping
    
    * code review changes
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * build(deps): bump protobuf version to 3.21.12 (#1234)
    
    * extract json_funcs expressions to folders based on spark grouping (#1220)
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * test: Enable shuffle by default in Spark tests (#1240)
    
    ## Which issue does this PR close?
    
    ## Rationale for this change
    
    Because `isCometShuffleEnabled` is false by default, some tests were not 
reached
    
    ## What changes are included in this PR?
    
    Removed `isCometShuffleEnabled` and updated spark test diff
    
    ## How are these changes tested?
    
    existing test
    
    * chore: extract hash_funcs expressions to folders based on spark grouping 
(#1221)
    
    * extract hash_funcs expressions to folders based on spark grouping
    
    * extract hash_funcs expressions to folders based on spark grouping
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * fix: Fall back to Spark for unsupported partition or sort expressions in 
window aggregates (#1253)
    
    * perf: Improve query planning to more reliably fall back to columnar 
shuffle when native shuffle is not supported (#1209)
    
    * fix regression (#1259)
    
    * feat: add support for array_remove expression (#1179)
    
    * wip: array remove
    
    * added comet expression test
    
    * updated test cases
    
    * fixed array_remove function for null values
    
    * removed commented code
    
    * remove unnecessary code
    
    * updated the test for 'array_remove'
    
    * added test for array_remove in case the input array is null
    
    * wip: case array is empty
    
    * removed test case for empty array
    
    * fix: Fall back to Spark for distinct aggregates (#1262)
    
    * fall back to Spark for distinct aggregates
    
    * update expected plans for 3.4
    
    * update expected plans for 3.5
    
    * force build
    
    * add comment
    
    * feat: Implement custom RecordBatch serde for shuffle for improved 
performance (#1190)
    
    * Implement faster encoder for shuffle blocks
    
    * make code more concise
    
    * enable fast encoding for columnar shuffle
    
    * update benches
    
    * test all int types
    
    * test float
    
    * remaining types
    
    * add Snappy and Zstd(6) back to benchmark
    
    * fix regression
    
    * Update native/core/src/execution/shuffle/codec.rs
    
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * address feedback
    
    * support nullable flag
    
    ---------
    
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    
    * docs: Update TPC-H benchmark results (#1257)
    
    * fix: disable initCap by default (#1276)
    
    * fix: disable initCap by default
    
    * Update spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * address review comments
    
    ---------
    
    Co-authored-by: Andy Grove <agr...@apache.org>
    
    * chore: Add changelog for 0.5.0 (#1278)
    
    * Add changelog
    
    * revert accidental change
    
    * move 2 items to performance section
    
    * update TPC-DS results for 0.5.0 (#1277)
    
    * fix: cast timestamp to decimal is unsupported (#1281)
    
    * fix: cast timestamp to decimal is unsupported
    
    * fix style
    
    * revert test name and mark as ignore
    
    * add comment
    
    * chore: Start 0.6.0 development (#1286)
    
    * start 0.6.0 development
    
    * update some docs
    
    * Revert a change
    
    * update CI
    
    * docs: Fix links and provide complete benchmarking scripts (#1284)
    
    * fix links and provide complete scripts
    
    * fix path
    
    * fix incorrect text
    
    * feat: Add HasRowIdMapping interface (#1288)
    
    ---------
    
    Signed-off-by: Dharan Aditya <dharan.adi...@gmail.com>
    Co-authored-by: NoeB <noe.br...@bluewin.ch>
    Co-authored-by: Liang-Chi Hsieh <vii...@gmail.com>
    Co-authored-by: Raz Luvaton <raz.luva...@flarion.io>
    Co-authored-by: Andy Grove <agr...@apache.org>
    Co-authored-by: KAZUYUKI TANIMURA <ktanim...@apple.com>
    Co-authored-by: Sem <ssinche...@apache.org>
    Co-authored-by: Himadri Pal <meh...@gmail.com>
    Co-authored-by: himadripal <h...@apple.com>
    Co-authored-by: gstvg <28798827+gs...@users.noreply.github.com>
    Co-authored-by: Adam Binford <adam...@gmail.com>
    Co-authored-by: Matt Butrovich <mbutrov...@users.noreply.github.com>
    Co-authored-by: Raz Luvaton <16746759+rluva...@users.noreply.github.com>
    Co-authored-by: Andrew Lamb <and...@nerdnetworks.org>
    Co-authored-by: Dharan Aditya <dharan.adi...@gmail.com>
    Co-authored-by: Kristin Cowalcijk <b...@wherobots.com>
    Co-authored-by: Oleks V <comph...@users.noreply.github.com>
    Co-authored-by: Zhen Wang <643348...@qq.com>
    Co-authored-by: Jagdish Parihar <jatin6...@gmail.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@datafusion.apache.org
For additional commands, e-mail: commits-h...@datafusion.apache.org

(datafusion-comet) branch comet-parquet-exec updated: chore: Merge remote-tracking branch 'apache/main' into comet-parquet-exec - 20240121 (#1316)

Reply via email to