[I] Native scans cannot propagate JVM-side Spark accumulators [datafusion-comet]

via GitHub Wed, 01 Apr 2026 03:53:31 -0700


andygrove opened a new issue, #3867:
URL: https://github.com/apache/datafusion-comet/issues/3867


   ## Description
   
   When using native Parquet scans 
(`COMET_PARQUET_SCAN_IMPL=native_datafusion`), Spark accumulators registered on 
the JVM side are not incremented because the native reader operates entirely in 
Rust via DataFusion and does not call back into JVM accumulator APIs.
   
   This causes several Spark SQL tests to fail because they rely on 
accumulators (e.g. `NumRowGroupsAcc`) to verify filter pushdown behavior. The 
filter pushdown itself works correctly in the native scan, but the test 
mechanism (the accumulator) does not work across the JNI boundary.
   
   ## Affected Tests
   
   The following Spark SQL tests are currently ignored with 
`IgnoreCometNativeScan` due to this limitation:
   
   - `filter pushdown - StringPredicate` (ParquetFilterSuite) - uses 
`NumRowGroupsAcc` to verify string predicate pushdown
   - `Filters should be pushed down for vectorized Parquet reader at row group 
level` (ParquetFilterSuite) - uses accumulator to verify row group level 
filtering
   - `SPARK-34562: Bloom filter push down` (ParquetFilterSuite) - uses 
accumulator to verify bloom filter pushdown
   
   These tests are skipped across all Spark versions (3.4, 3.5, 4.0).
   
   ## Possible Solutions
   
   1. Implement accumulator propagation from native scans back to JVM
   2. Add equivalent native-side metrics that can be queried from JVM after 
execution
   3. Write alternative Comet-specific tests that verify the same pushdown 
behavior without relying on accumulators


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Native scans cannot propagate JVM-side Spark accumulators [datafusion-comet]

Reply via email to