mapleFU commented on issue #34053: URL: https://github.com/apache/arrow/issues/34053#issuecomment-1422554647
> Perhaps it is not necessary to breakdown the IO to page, since Parquet-format states ColumnChunk is the IO unit. Yes, although the standard says so, but we can use it. Currently parquet-cpp implemention use both Page-IO and Chunk-IO: * arrow can use `ReadRangeCache` to serving chunk-level io, and, however, I don't think it provides an good performance. And currently, it will read whole buffer, and caching them in a `::arrow::io::BufferReader` * If no cache is used, an `ArrowInputStream` would be created directly on input, and `PageReader` will try do create read buffer page-by-page Besides, I think currently the implemention of `ReadRangeCache` is naive. I'm not sure it will works well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
