mbutrovich commented on issue #20324: URL: https://github.com/apache/datafusion/issues/20324#issuecomment-4803341556
I might have found an issue in Comet that would make this worse. It seems like the `PhysicalExprAdapter` in DF 53 introduces casts on type mismatches (even nullability) that Comet than replaces with a Spark-compatible `SparkCast` (not even sure that's correct behavior in Comet for Parquet pruning). However, `SparkCast` isn't recognized by `PruningPredicate` so I don't think we're getting correct metadata-level pruning with Comet. So we might be hammering the row-filter logic harder than we need to if the coarse-grained filters aren't doing work first. If this is the case, it does mean all of those rows were hitting the `CometFilter` node anyway, so we're still net slower with row-level filters on, but let me revisit Comet performance after this PR merges: https://github.com/apache/datafusion-comet/pull/4730 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
