[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

GitBox Tue, 24 May 2022 14:14:58 -0700


parthchandra commented on PR #968:
URL: https://github.com/apache/parquet-mr/pull/968#issuecomment-1136439133


   > Is byte (and arrays and buffers of bytes) the only datatype you support? 
My PR is optimizing code paths that pull ints, longs, and other sizes out of 
the data buffers. Are those not necessary for any of the situations where 
you're using an async buffer?
   The input stream API is generally unaware of the datatypes of its contents 
and so those are the only apis I use. The other reason is that the 
ParquetFileReader returns Pages which basically contain metadata and 
ByteBuffers of _compressed_ data. The decompression and decoding into types 
comes much later in a downstream thread.
   For your PR, I don't think the AsyncMultibufferInputStream is every going to 
be in play in the paths you're optimizing. But just in case it is, your type 
aware methods will work as is because AsyncMultibufferInputStream is derived 
from MultiBufferInputStream and will inherit those methods.  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [parquet-mr] parthchandra commented on pull request #968: PARQUET-2149: Async IO implementation for ParquetFileReader

Reply via email to