houqp commented on pull request #68:
URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-914006491


   > I am not familiar with this code-base, but I would expect that we create 
such mapping out of expressions: every row group is selected or not based on 
its metadata with statistics, and potentially its number in the parquet file 
for partitioned reads of a single file.
   
   Our `build_row_group_predicate` function does exactly this right now. The 
issue I am trying to resolve is in order to build this mapping, I need access 
to metadata. So it looks like we will need to manually read the metadata first 
to create the mapping, then call `RecordReader::try_new` with the constructed 
`groups_filter` argument? This approach requires us to read the same metadata 
twice, once before calling `RecordReader::try_new`, once inside of 
`RecordReader::try_new`. Perhaps we could expose the metadata field from 
`RecordReader` and provide a `set_groups_filter` method to allow updating the 
row group filter after it's been created?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to