Re: [I] Support limit pruning [datafusion]

via GitHub Fri, 21 Nov 2025 06:20:58 -0800


alamb commented on issue #18860:
URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3563221015


   FWIW if the limit is pushed into the parquet reader, it will internally skip 
reading future row groups once the limit is reached. Here is some of the 
relevant
   
   
https://github.com/apache/arrow-rs/blob/ed9efe78e4cc958cc96707557818e754419debb0/parquet/src/arrow/arrow_reader/read_plan.rs#L254-L294
   
   So in other words, unless you are able to skip additional files / IO than 
otherwise I suspect applying a limit in DataFusion to skip row groups might not 
improve performance much
   
   
   > And one (row group 3) of the four is fully matched with the predicates, 
and the others are not, which are only partially matched.
   
   What does "fully matches" /"partially matched" mean in this case? Does that 
mean all the rows in the row groups would be filtered?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Support limit pruning [datafusion]

Reply via email to