[ 
https://issues.apache.org/jira/browse/HADOOP-19021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HADOOP-19021:
------------------------------------
    Summary: [ABFS] move to jdk11 HttpClient for http2 and connection keep 
alive  (was: in hadoop-azure, use jdk11 HttpClient instead of legacy 
java.net.HttpURLConnection, for supporting http2 and connection keep alive)

> [ABFS] move to jdk11 HttpClient for http2 and connection keep alive
> -------------------------------------------------------------------
>
>                 Key: HADOOP-19021
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19021
>             Project: Hadoop Common
>          Issue Type: Improvement
>    Affects Versions: 3.4.0
>            Reporter: Arnaud Nauwynck
>            Priority: Critical
>
> As described in Jira Title: "in hadoop-azure, use jdk11 HttpClient instead of 
> legacy java.net.HttpURLConnection, for supporting http2 and connection keep 
> alive"
> Few remarks:
> 1/ The official Azure SDK supports either OkHttp or Netty for the Http 
> transport.
> 2/ the actual hadoop-azure use the class java.net.HttpURLConnection, which is 
> slow.
>   It does not use Http2, does not optimize SSL Hand-shake very well, and does 
> not keep TCP connection alive for re-use.
> 3/ JDK since version >=11 have a new class HttpClient which should be a 
> better replacement 
> 4/ it might be possible to introduce a configuration property (with defaut to 
> use legacy class) , and an abstract factory to create connection via either 
> HttpURLConnection or any other pluggeable implementation (jdk 11 HttpClient, 
> OkHttp, Netty, ...)
> 5/ the official Azure SDK is maintained by Microsoft, so should better follow 
> bug fixes and improvements than custom hadoop implementation?
> [https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable
> |https://learn.microsoft.com/en-us/java/api/overview/azure/storage-file-datalake-readme?view=azure-java-stable]
> 6/ when we use code with the official Azure SDK and Hadoop(in Spark), it is 
> chocking to have 2 different implementations within the same JVM... 
> 7/ The official Azure SDK has more features that what allows the legacy 
> hadoop class FileSystem to do... In particular, we can append (=upload) file 
> by multiple threads (upload by fragments at different offsets), then flush 
> when every fragments are sent.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to