[GitHub] [parquet-mr] steveloughran commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

via GitHub Mon, 18 Sep 2023 10:29:30 -0700


steveloughran commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1724053099


   @danielcweeks that's a good point about pluggability.
   
   1. an interface/implementation split in parquet would line you up later to 
choose the back end, maybe?
   2. I've done an initial pass at an shim library to use vectored IO 
operations if a stream/hadoop version had it, but fall back to usual blocking 
reads if not (along with the same for everything else). but just getting the 
base vector io stuff into parquet is a lot simpler. I don't know if that would 
be useful for iceberg https://github.com/apache/hadoop-api-shim
   3. video on the whole topic
   
   getting iceberg to pass down which stripes it wants to read is critical for 
this to work best with s3, abfs and gcs. how are you reading the files at 
present?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [parquet-mr] steveloughran commented on pull request #1139: PARQUET-2171: Support Hadoop vectored IO

Reply via email to