jorisvandenbossche opened a new pull request, #39065:
URL: https://github.com/apache/arrow/pull/39065

   ### Rationale for this change
   
   Currently when filtering with a nested field reference, we were taking the 
corresponding parquet SchemaField for just the first index of the nested path, 
i.e. the parent node in the Parquet schema. But logically, filtering on 
statistics only works for a primitive leaf node.
   
   This PR changes that logic to iterate over all indices of the FieldPath, if 
nested, to ensure we use the actual corresponding child leaf node of the 
ParquetSchema to get the statistics from.
   
   ### Are there any user-facing changes?
   
   No, only improving performance by doing the filtering at the row group 
stage, instead of afterwards on the read data


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to