alamb commented on issue #18860: URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3563342071
> > FWIW if the limit is pushed into the parquet reader, it will internally skip reading future row groups once the limit is reached. Here is some of the relevant code > > If there's a filter, I think we still need to do row group pruning, then for the matched row groups, do row filters and get the limit rows. If there is a filter applied in the scan (via `pushdown_filters`), the parquet reader will stop (and not fetch any more row groups) once the limit is hit. https://github.com/apache/arrow-rs/blob/ed9efe78e4cc958cc96707557818e754419debb0/parquet/src/arrow/push_decoder/reader_builder/mod.rs#L504-L518 I am probably not understanding what you are proposing. I'll try and read the PR > > > What does "fully matches" /"partially matched" mean in this case? Does that mean all the rows in the row groups would be filtered? So fully matches means there are no rows that are filtered out -- make sensee -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
