houqp commented on pull request #68: URL: https://github.com/apache/arrow-datafusion/pull/68#issuecomment-913898374
@jorgecarleitao we need to build the row group filter lambda using parquet file metadata. I noticed the `RecordReader` is already reading that metadata in https://github.com/jorgecarleitao/arrow2/blob/92e227706c1235ddf3ef62dcba440df241711f48/src/io/parquet/read/record_batch.rs#L40 Is there a way for us to reuse what has already been read there to build the row group filter? `RecordReader::try_new` requires passing in a row group filter argument, so I read the metadata manually before calling `try_new`, I would be reading the same bytes twice. This is for migrating https://github.com/apache/arrow-datafusion/blob/7932cb9373192ce2754b39c1f82f22c8a56b7266/datafusion/src/physical_plan/parquet.rs#L539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
