zhuqi-lucas commented on issue #17172: URL: https://github.com/apache/datafusion/issues/17172#issuecomment-3199076705
> > 1. Is the idea here essentially to push down the TopK (ORDER BY … LIMIT k) into the ParquetExec, so that the scan itself can stop early rather than decoding the full file? > > Not directly, that could be a follow-up though. The idea is primarily that you can stream parquet data in reverse order. > > > And if so, does the “fast parquet order inversion” optimization further enhance this pushdown, since it allows the scan to read from the end of the file and flip pages efficiently? > > I think it would. > > > Can we benefit if we don't have Topk pushdown from this umbrella? > > Yes, because you can still stream-process the parquet data in reverse order. Right now, you cannot stream in reverse order at all, all data has to be loaded into memory and is then reversed by a `SortExec`. Thank you @crepererum for the explanation, it makes sense to me now. This looks like a great improvement! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org