[
https://issues.apache.org/jira/browse/HADOOP-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Steve Loughran updated HADOOP-19021:
------------------------------------
Summary: [ABFS] move to jdk11 HttpClient for http2 and connection keep
alive (was: in hadoop-azure, use jdk11 HttpClient instead of legacy
java.net.HttpURLConnection, for supporting http2 and connection keep alive)
> [ABFS] move to jdk11 HttpClient for http2 and connection keep alive
> -------------------------------------------------------------------
>
> Key: HADOOP-19021
> URL: https://issues.apache.org/jira/browse/HADOOP-19021
> Project: Hadoop Common
> Issue Type: Improvement
> Affects Versions: 3.4.0
> Reporter: Arnaud Nauwynck
> Priority: Critical
>
> As described in Jira Title: "in hadoop-azure, use jdk11 HttpClient instead of
> legacy java.net.HttpURLConnection, for supporting http2 and connection keep
> alive"
> Few remarks:
> 1/ The official Azure SDK supports either OkHttp or Netty for the Http
> transport.
> 2/ the actual hadoop-azure use the class java.net.HttpURLConnection, which is
> slow.
> It does not use Http2, does not optimize SSL Hand-shake very well, and does
> not keep TCP connection alive for re-use.
> 3/ JDK since version >=11 have a new class HttpClient which should be a
> better replacement
> 4/ it might be possible to introduce a configuration property (with defaut to
> use legacy class) , and an abstract factory to create connection via either
> HttpURLConnection or any other pluggeable implementation (jdk 11 HttpClient,
> OkHttp, Netty, ...)
> 5/ the official Azure SDK is maintained by Microsoft, so should better follow
> bug fixes and improvements than custom hadoop implementation?
> [https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable
> |https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable]
> 6/ when we use code with the official Azure SDK and Hadoop(in Spark), it is
> chocking to have 2 different implementations within the same JVM...
> 7/ The official Azure SDK has more features that what allows the legacy
> hadoop class FileSystem to do... In particular, we can append (=upload) file
> by multiple threads (upload by fragments at different offsets), then flush
> when every fragments are sent.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]