[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

ASF GitHub Bot (Jira) Thu, 30 Nov 2023 05:48:05 -0800


    [ 
https://issues.apache.org/jira/browse/PARQUET-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17791631#comment-17791631
 ]


ASF GitHub Bot commented on PARQUET-2171:
-----------------------------------------

steveloughran commented on code in PR #1139:
URL: https://github.com/apache/parquet-mr/pull/1139#discussion_r1410699415


##########
parquet-hadoop/README.md:
##########
@@ -501,3 +501,11 @@ If `false`, key material is stored in separate new files, 
created in the same fo
 **Description:** Length of key encryption keys (KEKs), randomly generated by 
parquet key management tools. Can be 128, 192 or 256 bits.  
 **Default value:** `128`
 
+---
+
+**Property:** `parquet.hadoop.vectored.io.enabled`  
+**Description:** Flag to enable use of the FileSystem Vector IO API on Hadoop 
releases which support the feature.
+If `true` then an attempt will be made to dynamically load the relevant 
classes; 

Review Comment:
   no, hdfs doesn't support it. Native IO does, so if you use file:// URLS you 
get direct NIO vectored IO into buffers (yay! hadoop APIs move to the 2010s!). 
S3A supports it with multiple parallel GET with some range coalescing in 
between. Would love ABFS connector to support it too...





> Implement vectored IO in parquet file format
> --------------------------------------------
>
>                 Key: PARQUET-2171
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2171
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-mr
>            Reporter: Mukund Thakur
>            Priority: Major
>
> We recently added a new feature called vectored IO in Hadoop for improving 
> read performance for seek heavy readers. Spark Jobs and others which uses 
> parquet will greatly benefit from this api. Details can be found here 
> [https://github.com/apache/hadoop/commit/e1842b2a749d79cbdc15c524515b9eda64c339d5]
> https://issues.apache.org/jira/browse/HADOOP-18103
> https://issues.apache.org/jira/browse/HADOOP-11867



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (PARQUET-2171) Implement vectored IO in parquet file format

Reply via email to