Re: [PR] feat: Enable DPP support with native_datafusion scan [datafusion-comet]

via GitHub Mon, 12 Jan 2026 05:27:57 -0800


Shekharrajak commented on code in PR #3060:
URL: https://github.com/apache/datafusion-comet/pull/3060#discussion_r2682277730



##########
spark/src/main/scala/org/apache/comet/serde/operator/CometNativeScan.scala:
##########
@@ -191,6 +193,10 @@ object CometNativeScan extends 
CometOperatorSerde[CometScanExec] with Logging {
         }
       }
 
+      // Add runtime filter bounds if available
+      // These are pushed down from join operators to enable I/O reduction
+      addRuntimeFilterBounds(scan, nativeScanBuilder)

Review Comment:
   Thanks for sharing. I tried native runtime filters execution but do not see 
any improvements 
   
   The benchmark shows Comet is  slower than Spark for workloads. This could be 
 due to:
   
   JNI Overhead: Significant cost crossing JVM/native boundary
   Spark's Vectorized Reader: Highly optimized for in-memory parquet reading
   Runtime Filters Not Effective: Filters may not be pruning row groups because:
   Benchmark parquet files don't have bloom filters written
   Data is in few row groups (small files)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Enable DPP support with native_datafusion scan [datafusion-comet]

Reply via email to