pitrou commented on issue #43057: URL: https://github.com/apache/arrow/issues/43057#issuecomment-2497980432
> It looks like the `PageReader` API doesn't provide a way to parallelise reading though, you can only iterate over data pages sequentially, so I don't think this is a concern: It may not be obvious how to use it with other Parquet C++ APIs, but the OffsetIndex conceptually allows direct access to individual pages. So, ideally at least, and hopefully in the future, it will be possible to access individual data pages from a column in a non-sequential fashion. (cc @wgtmac @mapleFU ) https://github.com/apache/arrow/blob/71389f845ef5f2e71dfa566f0ab4bb2988f88a8f/cpp/src/parquet/page_index.h#L119-L132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
