Re: [PR] feat: accelerate Iceberg RewriteDataFiles reads via Comet native scan [datafusion-comet]

via GitHub Thu, 07 May 2026 10:22:30 -0700


mbutrovich commented on PR #4251:
URL: 
https://github.com/apache/datafusion-comet/pull/4251#issuecomment-4399424325


   Thanks for this, Jordan. Nice find that SparkStagedScan is the entry point 
for RewriteDataFiles, and the listener-based plan capture in the test reads 
clean.
   
   A few questions.
   
   On the Spark 3.4 CI failure from before the force-push, do you still have 
the stack? I checked Iceberg 1.5.2 and the SparkScan / 
SparkPartitioningAwareScan hierarchy matches 3.5, so the three accessors in 
`IcebergReflection` should resolve the same way. My guess is the failure is in 
the test's own fluent-builder reflection against the Actions API rather than 
the new reflection helpers, but hard to say without the trace.
   
   For test coverage, have you tried a rewrite over a table that already has 
MOR position deletes applied? The current suite exercises plain data files 
only, and the behavior I most want pinned down is that the flattened 
FileScanTasks carry their `deletes()` through the native path. A binPack over a 
table with one row deleted would be the direct assertion.
   
   Smaller thing: the metadata-table guard at `CometScanRule.scala:98-116` 
already rejects `.position_deletes` via suffix match, which I think is what 
keeps the new SparkStagedScan case safely scoped to data-file rewrites and away 
from `RewritePositionDeleteFilesSparkAction`. A short comment near the 
StagedScan match pointing at that upstream check would have saved me a detour 
on review.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: accelerate Iceberg RewriteDataFiles reads via Comet native scan [datafusion-comet]

Reply via email to