andygrove opened a new issue, #3867: URL: https://github.com/apache/datafusion-comet/issues/3867
## Description When using native Parquet scans (`COMET_PARQUET_SCAN_IMPL=native_datafusion`), Spark accumulators registered on the JVM side are not incremented because the native reader operates entirely in Rust via DataFusion and does not call back into JVM accumulator APIs. This causes several Spark SQL tests to fail because they rely on accumulators (e.g. `NumRowGroupsAcc`) to verify filter pushdown behavior. The filter pushdown itself works correctly in the native scan, but the test mechanism (the accumulator) does not work across the JNI boundary. ## Affected Tests The following Spark SQL tests are currently ignored with `IgnoreCometNativeScan` due to this limitation: - `filter pushdown - StringPredicate` (ParquetFilterSuite) - uses `NumRowGroupsAcc` to verify string predicate pushdown - `Filters should be pushed down for vectorized Parquet reader at row group level` (ParquetFilterSuite) - uses accumulator to verify row group level filtering - `SPARK-34562: Bloom filter push down` (ParquetFilterSuite) - uses accumulator to verify bloom filter pushdown These tests are skipped across all Spark versions (3.4, 3.5, 4.0). ## Possible Solutions 1. Implement accumulator propagation from native scans back to JVM 2. Add equivalent native-side metrics that can be queried from JVM after execution 3. Write alternative Comet-specific tests that verify the same pushdown behavior without relying on accumulators -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
