mbutrovich commented on PR #4251: URL: https://github.com/apache/datafusion-comet/pull/4251#issuecomment-4399424325
Thanks for this, Jordan. Nice find that SparkStagedScan is the entry point for RewriteDataFiles, and the listener-based plan capture in the test reads clean. A few questions. On the Spark 3.4 CI failure from before the force-push, do you still have the stack? I checked Iceberg 1.5.2 and the SparkScan / SparkPartitioningAwareScan hierarchy matches 3.5, so the three accessors in `IcebergReflection` should resolve the same way. My guess is the failure is in the test's own fluent-builder reflection against the Actions API rather than the new reflection helpers, but hard to say without the trace. For test coverage, have you tried a rewrite over a table that already has MOR position deletes applied? The current suite exercises plain data files only, and the behavior I most want pinned down is that the flattened FileScanTasks carry their `deletes()` through the native path. A binPack over a table with one row deleted would be the direct assertion. Smaller thing: the metadata-table guard at `CometScanRule.scala:98-116` already rejects `.position_deletes` via suffix match, which I think is what keeps the new SparkStagedScan case safely scoped to data-file rewrites and away from `RewritePositionDeleteFilesSparkAction`. A short comment near the StagedScan match pointing at that upstream check would have saved me a detour on review. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
