Fokko commented on issue #33972: URL: https://github.com/apache/arrow/issues/33972#issuecomment-1416165705
> We don't have a format-agnostic concept of "read the metadata but cache it for use later so you don't have to read it again". That's not a problem, as long as it keeps cached in the fragment. Because the reverse bytes to get the footer are rather expensive (in terms of time), so we would love to eliminate that call. I went through the code, and was able to pass down the metadata from the fragment down to the reader: https://github.com/apache/arrow/pull/34015 > The simple ParquetFile interface for single files doesn't support filtering row groups with a filter, so that would be a step back from using `pq.read_table`? I agree, we need to have predicate pushdown 👍🏻 > Longer term, you can probably just specify a [custom evolution strategy](https://github.com/apache/arrow/blob/apache-arrow-11.0.0/cpp/src/arrow/dataset/dataset.h#L254) (using parquet column IDs) and let pyarrow handle the expression conversion for you. Sadly, this feature is not yet ready (I'm working on it when I can. 🤞 for 12.0.0) Let me know when something is ready, happy to test 👍🏻 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
