[ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791061#comment-17791061
 ] 

ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

steveloughran commented on PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#issuecomment-1831828739

   Code wise, no, other than reviews from others about what is the best place 
for things, such as that awaitFuture stuff or any other suggestions which 
people who know the parquet codebase think is best. Code works and we have been 
testing this through Amazon S3 Express storage for extra speed up. To be 
ruthless: there's no point paying the premium for that until you've embraced 
the extra speed ups you get from this first




> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found hereĀ 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to