xudong963 commented on issue #18860: URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3563295086
> FWIW if the limit is pushed into the parquet reader, it will internally skip reading future row groups once the limit is reached. Here is some of the relevant code If there's a filter, I think we still need to do row group pruning, then for the matched row groups, do row filters and get the limit rows. > What does "fully matches" /"partially matched" mean in this case? Does that mean all the rows in the row groups would be filtered? For a row group, currently, we define it in row group pruning as pruned or matched. Here, I mean the matched row group can't be further subdivided into partially matched and fully matched. Then we leverage the fully matched row groups to return the limit k, we can reduce the cost of fetching partially row groups and doing row filter for them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
