alamb commented on issue #18860: URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3563221015
FWIW if the limit is pushed into the parquet reader, it will internally skip reading future row groups once the limit is reached. Here is some of the relevant https://github.com/apache/arrow-rs/blob/ed9efe78e4cc958cc96707557818e754419debb0/parquet/src/arrow/arrow_reader/read_plan.rs#L254-L294 So in other words, unless you are able to skip additional files / IO than otherwise I suspect applying a limit in DataFusion to skip row groups might not improve performance much > And one (row group 3) of the four is fully matched with the predicates, and the others are not, which are only partially matched. What does "fully matches" /"partially matched" mean in this case? Does that mean all the rows in the row groups would be filtered? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
