[
https://issues.apache.org/jira/browse/HADOOP-18884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17900604#comment-17900604
]
Arnaud Nauwynck edited comment on HADOOP-18884 at 11/23/24 5:24 PM:
--------------------------------------------------------------------
Please see comment in (duplicate) jira issue [#HADOOP-19345]
Even if abfss did not support multiple GET requests, it is NOT a problem of
merging almost consecutive read requests, and ignore data wholes it it.
Indeed, it is much more efficient to read 8Mo more in an Azure request than to
open a new Https connection(TCP-IP connection + TLS handshake + small Request
even of 0 byte)
Notice also that azure request are limited to 16Mo ( ? ), but are billed by
multiple of 4 Mo.
So if you read only 1 byte, you are billed anyway for the 4Mo.
See Azure doc
[https://azure.microsoft.com/en-us/pricing/details/storage/blobs/|https://azure.microsoft.com/en-us/pricing/details/storage/blobs/]
{noformat}
When using ADLS Gen2 API for transactions, read and write transactions occur
for every 4 MB of data.
{noformat}
was (Author: arnaud.nauwynck):
Please see comment in (duplicate) jira issue [#HADOOP-19345]
Even if abfss did not support multiple GET requests, it is NOT a problem of
merging almost consecutive read requests, and ignore data wholes it it.
Indeed, it is much more efficient to read 8Mo more in an Azure request than to
open a new Https connection(TCP-IP connection + TLS handshake + small Request
even of 0 byte)
Notice also that azure request are limited to 16Mo (?), but are billed by
multiple of 4 Mo.
So if you read only 1 byte, you are billed anyway for the 4Mo.
See Azure doc
[https://azure.microsoft.com/en-us/pricing/details/storage/blobs/|https://azure.microsoft.com/en-us/pricing/details/storage/blobs/]
{noformat}
When using ADLS Gen2 API for transactions, read and write transactions occur
for every 4 MB of data.
{noformat}
> [ABFS] Support VectorIO in ABFS Input Stream
> --------------------------------------------
>
> Key: HADOOP-18884
> URL: https://issues.apache.org/jira/browse/HADOOP-18884
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.3.9
> Reporter: Steve Loughran
> Assignee: Anmol Asrani
> Priority: Major
>
> the hadoop vector IO APIs are supported in file;// and s3a://; there's a hive
> ORC patch for this and PARQUET-2171 adds it for parquet -after which all apps
> using the library with a matching hadoop version and the feature enabled will
> get a significant speedup.
> abfs needs to support too, which needs support for parallel GET requests for
> different ranges
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]