xudong963 commented on issue #18860:
URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3563295086

   > FWIW if the limit is pushed into the parquet reader, it will internally 
skip reading future row groups once the limit is reached. Here is some of the 
relevant code
   
   If there's a filter, I think we still need to do row group pruning, then for 
the matched row groups, do row filters and get the limit rows.
   
   
   > What does "fully matches" /"partially matched" mean in this case? Does 
that mean all the rows in the row groups would be filtered?
   
   For a row group, currently, we define it in row group pruning as pruned or 
matched. Here, I mean the matched row group can't be further subdivided into 
partially matched and fully matched.
   
   Then we leverage the fully matched row groups to return the limit k, we can 
reduce the cost of fetching partially row groups and doing row filter for them.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to