[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

ASF GitHub Bot (Jira) Tue, 17 Oct 2023 09:36:04 -0700


    [ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776272#comment-17776272
 ]


ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

parthchandra commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1766778525

   @ahmarsuhail No these numbers are not with iceberg and S3FileIO. 
   I used a modified (lots of stuff removed) version of the ParquetFileReader 
and a custom benchmark program that reads all the row groups in parallel and 
records the time spent in each read from S3. The modified version of 
ParquetFileReader can switch between the various methods of reading from S3.
   The entry `AWS SDK V2` is a near copy of the Iceberg S3FileIO code though. 
   I saw issues with the CRT client when running at scale causing JVM crashes. 
And the V2 transfer manager did not do range reads properly. Do share your 
experience.




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found here 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

Reply via email to