[ https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791061#comment-17791061 ]
ASF GitHub Bot commented on PARQUET-2171: ----------------------------------------- steveloughran commented on PR #1139: URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1831828739 Code wise, no, other than reviews from others about what is the best place for things, such as that awaitFuture stuff or any other suggestions which people who know the parquet codebase think is best. Code works and we have been testing this through Amazon S3 Express storage for extra speed up. To be ruthless: there's no point paying the premium for that until you've embraced the extra speed ups you get from this first > Implement vectored IO in parquet file format > -------------------------------------------- > > Key: PARQUET-2171 > URL: https://issues.apache.org/jira/browse/PARQUET-2171 > Project: Parquet > Issue Type: New Feature > Components: parquet-mr > Reporter: Mukund Thakur > Priority: Major > > We recently added a new feature called vectored IO in Hadoop for improving > read performance for seek heavy readers. Spark Jobs and others which uses > parquet will greatly benefit from this api. Details can be found hereĀ > [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5] > https://issues.apache.org/jira/browse/HADOOP-18103 > https://issues.apache.org/jira/browse/HADOOP-11867 -- This message was sent by Atlassian Jira (v8.20.10#820010)