mapleFU commented on issue #34053:
URL: https://github.com/apache/arrow/issues/34053#issuecomment-1422554647

   > Perhaps it is not necessary to breakdown the IO to page, since 
Parquet-format states ColumnChunk is the IO unit.
   
   Yes, although the standard says so, but we can use it. Currently parquet-cpp 
implemention use both Page-IO and Chunk-IO:
   
   * arrow can use `ReadRangeCache` to serving chunk-level io, and, however, I 
don't think it provides an good performance. And currently,  it will read whole 
buffer, and caching them in a `::arrow::io::BufferReader`
   * If no cache is used, an `ArrowInputStream` would be created directly on 
input, and `PageReader` will try do create read buffer page-by-page
   
   Besides, I think currently the implemention of `ReadRangeCache` is naive. 
I'm not sure it will works well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to