zhuqi-lucas commented on issue #17172:
URL: https://github.com/apache/datafusion/issues/17172#issuecomment-3199076705

   > > 1. Is the idea here essentially to push down the TopK (ORDER BY … LIMIT 
k) into the ParquetExec, so that the scan itself can stop early rather than 
decoding the full file?
   > 
   > Not directly, that could be a follow-up though. The idea is primarily that 
you can stream parquet data in reverse order.
   > 
   > > And if so, does the “fast parquet order inversion” optimization further 
enhance this pushdown, since it allows the scan to read from the end of the 
file and flip pages efficiently?
   > 
   > I think it would.
   > 
   > > Can we benefit if we don't have Topk pushdown from this umbrella?
   > 
   > Yes, because you can still stream-process the parquet data in reverse 
order. Right now, you cannot stream in reverse order at all, all data has to be 
loaded into memory and is then reversed by a `SortExec`.
   
   Thank you @crepererum for the explanation, it makes sense to me now. This 
looks like a great improvement!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to