Shekharrajak commented on code in PR #3060:
URL: https://github.com/apache/datafusion-comet/pull/3060#discussion_r2682277730
##########
spark/src/main/scala/org/apache/comet/serde/operator/CometNativeScan.scala:
##########
@@ -191,6 +193,10 @@ object CometNativeScan extends
CometOperatorSerde[CometScanExec] with Logging {
}
}
+ // Add runtime filter bounds if available
+ // These are pushed down from join operators to enable I/O reduction
+ addRuntimeFilterBounds(scan, nativeScanBuilder)
Review Comment:
Thanks for sharing. I tried native runtime filters execution but do not see
any improvements
The benchmark shows Comet is slower than Spark for workloads. This could be
due to:
JNI Overhead: Significant cost crossing JVM/native boundary
Spark's Vectorized Reader: Highly optimized for in-memory parquet reading
Runtime Filters Not Effective: Filters may not be pruning row groups because:
Benchmark parquet files don't have bloom filters written
Data is in few row groups (small files)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]