[jira] [Commented] (HADOOP-18884) [ABFS] Support VectorIO in ABFS Input Stream

Arnaud Nauwynck (Jira) Sat, 23 Nov 2024 09:30:14 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-18884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900604#comment-17900604
 ]


Arnaud Nauwynck commented on HADOOP-18884:
------------------------------------------

Please see comment in (duplicate) jira issue [#HADOOP-19345]

Even if abfss did not support multiple GET requests, it is NOT a problem of 
merging almost consecutive read requests, and ignore data wholes it it. 
Indeed, it is much more efficient to read 8Mo more in an Azure request than to 
open a new Https connection(TCP-IP connection + TLS handshake + small Request 
even of 0 byte)



> [ABFS] Support VectorIO in ABFS Input Stream
> --------------------------------------------
>
>                 Key: HADOOP-18884
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18884
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/azure
>    Affects Versions: 3.3.9
>            Reporter: Steve Loughran
>            Assignee: Anmol Asrani
>            Priority: Major
>
> the hadoop vector IO APIs are supported in file;// and s3a://; there's a hive 
> ORC patch for this and PARQUET-2171 adds it for parquet -after which all apps 
> using the library with a matching hadoop version and the feature enabled will 
> get a significant speedup.
> abfs needs to support too, which needs support for parallel GET requests for 
> different ranges



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18884) [ABFS] Support VectorIO in ABFS Input Stream

Reply via email to