2010YOUY01 commented on issue #18319: URL: https://github.com/apache/datafusion/issues/18319#issuecomment-3748471631
> that PR has a very long description. At a glance it definitely seems relevant but I can't tell how long it would take to accommodate dates and I'm not sure parquet automatically maintains statistics for dates by default. One issue with #19487 is that it may take longer to land in a release. The major challenge is review latency (any help to move it forward sooner would be greatly appreciated). Making `date_trunc` prunable itself should be fairly straightforward. This rewrite-based approach is also much more likely to get merged quickly, so it makes sense to proceed with it if you need a solution in the near term. The gist of #19487 is we let the pruning framework handle arbitrarily complex predicates automatically, otherwise we assume the pruner can only handle naive exprs like `col < constant`, and we have to maintain dozens of rewrite rules like this one, and the rewrite rule is not flexible enough to handle slightly more complex patterns. (like it rewrites `date_trunc(part, column) <= constant_rhs`, but might fail to prune if we wrap `column` with one additional date-related function.) So I believe it's a better long-term solution. > Also, it doesn't seem efficient for this particular optimization because it evaluates per batch (micro-partition?) something that can be evaluated once up front. I think the only scenario it has equivalent performance is in the case of tens of batches and when the plan is not cached in any way. For the performance part, I think this rewrite do fold one constant value in the example (RHS of <=), but the `<=` expr still have to get evaluated on all containers anyway. ``` column <= date_trunc(part, date_add(constant_rhs, INTERVAL 1 part) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
