alamb commented on issue #15512:
URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2816765297

   > Currently, q23 takes approximately 6 seconds to execute. I have confirmed 
that DataFusion does not have the aforementioned optimizations and still scans 
a very large number of rows and columns. By the way, is there a convenient way 
in `datafusion-cli` to view statistics on the number of rows and columns 
scanned? Currently, I directly print the batch information in the `Like` 
expression, which gives the following output (it seems endless, and the amount 
of data being scanned appears to be very large, all with exactly 105 columns):
   
   @acking-you  I agree with your analysis and arrived at a similar conclusion 
in this ticket: 
   -  https://github.com/apache/datafusion/issues/15177
   
   
   > It seems that there is no mention of deferred materialization for "order 
by limit" (perhaps I missed it since the content of the issue is quite long). 
So, have we considered optimizing "order by limit" in two phases? I'm planning 
to review and study the PR you mentioned over the weekend. Thanks again for 
your reply.
   
   I also agree The deferred materialization is key to improving performance 
massively. I believe this is the effect of 
https://github.com/apache/datafusion/issues/3463 though it does not use that 
term
   
   > Please see https://github.com/apache/datafusion/issues/3463 which 
@zhuqi-lucas linked to above.
   
   So TLDR is by combing the following two items 
   - https://github.com/apache/datafusion/issues/3463
   - https://github.com/apache/datafusion/pull/15301
   
   I think DataFusion will have the equivalent of materialized filter


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to