alamb commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2816765297
> Currently, q23 takes approximately 6 seconds to execute. I have confirmed that DataFusion does not have the aforementioned optimizations and still scans a very large number of rows and columns. By the way, is there a convenient way in `datafusion-cli` to view statistics on the number of rows and columns scanned? Currently, I directly print the batch information in the `Like` expression, which gives the following output (it seems endless, and the amount of data being scanned appears to be very large, all with exactly 105 columns): @acking-you I agree with your analysis and arrived at a similar conclusion in this ticket: - https://github.com/apache/datafusion/issues/15177 > It seems that there is no mention of deferred materialization for "order by limit" (perhaps I missed it since the content of the issue is quite long). So, have we considered optimizing "order by limit" in two phases? I'm planning to review and study the PR you mentioned over the weekend. Thanks again for your reply. I also agree The deferred materialization is key to improving performance massively. I believe this is the effect of https://github.com/apache/datafusion/issues/3463 though it does not use that term > Please see https://github.com/apache/datafusion/issues/3463 which @zhuqi-lucas linked to above. So TLDR is by combing the following two items - https://github.com/apache/datafusion/issues/3463 - https://github.com/apache/datafusion/pull/15301 I think DataFusion will have the equivalent of materialized filter -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org