Re: [PR] PARQUET-2171: Support Hadoop vectored IO [parquet-mr]

via GitHub Tue, 17 Oct 2023 09:35:22 -0700


parthchandra commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1766778525


   @ahmarsuhail No these numbers are not with iceberg and S3FileIO. 
   I used a modified (lots of stuff removed) version of the ParquetFileReader 
and a custom benchmark program that reads all the row groups in parallel and 
records the time spent in each read from S3. The modified version of 
ParquetFileReader can switch between the various methods of reading from S3.
   The entry `AWS SDK V2` is a near copy of the Iceberg S3FileIO code though. 
   I saw issues with the CRT client when running at scale causing JVM crashes. 
And the V2 transfer manager did not do range reads properly. Do share your 
experience.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] PARQUET-2171: Support Hadoop vectored IO [parquet-mr]

Reply via email to