houqp commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-914006491
> I am not familiar with this code-base, but I would expect that we create such mapping out of expressions: every row group is selected or not based on its metadata with statistics, and potentially its number in the parquet file for partitioned reads of a single file. Our `build_row_group_predicate` function does exactly this right now. The issue I am trying to resolve is in order to build this mapping, I need access to metadata. So it looks like we will need to manually read the metadata first to create the mapping, then call `RecordReader::try_new` with the constructed `groups_filter` argument? This approach requires us to read the same metadata twice, once before calling `RecordReader::try_new`, once inside of `RecordReader::try_new`. Perhaps we could expose the metadata field from `RecordReader` and provide a `set_groups_filter` method to allow updating the row group filter after it's been created? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
