[
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mukund Thakur updated HADOOP-19120:
-----------------------------------
Release Note:
Apache httpclient 4.5.x is a new implementation of http connections; this
supports a large configurable pool of connections along with the ability to
limit their lifespan.
The networking library can be chosen using the configuration
option fs.azure.networking.library
The supported values are
- JDK_HTTP_URL_CONNECTION : Use JDK networking library [Default]
- APACHE_HTTP_CLIENT : Use Apache HttpClient
Important: when the networking library is switched back to
the Apache http client, the apache httpcore and httpclient must be on the
classpath.
was:
Apache httpclient 4.5.x is the new default implementation of http connections;
this supports a large configurable pool of connections along withthe ability to
limit their lifespan.
The networking library can be chosen using the configuration
option fs.azure.networking.library
The supported values are
- APACHE_HTTP_CLIENT : Use Apache HttpClient [Default]
- JDK_HTTP_URL_CONNECTION : Use JDK networking library
Important: unless the networking library is switched back to
the JDK, the apache httpcore and httpclient must be on the classpath
> [ABFS]: ApacheHttpClient adaptation as network library
> ------------------------------------------------------
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/azure
> Affects Versions: 3.5.0
> Reporter: Pranav Saxena
> Assignee: Pranav Saxena
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Apache HttpClient is more feature-rich and flexible and gives application
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by
> OpenJDK and has no performance problem. However, it limits the application's
> control over networking, and there are very few APIs and hooks exposed that
> the application can use to get metrics, choose which and when a connection
> should be reused. ApacheHttpClient will give important hooks to fetch
> important metrics and control networking parameters.
> A custom implementation of connection-pool is used. The implementation is
> adapted from the JDK8 connection pooling. Reasons for doing it:
> 1. PoolingHttpClientConnectionManager heuristic caches all the reusable
> connections it has created. JDK's implementation only caches limited number
> of connections. The limit is given by JVM system property
> "http.maxConnections". If there is no system-property, it defaults to 5.
> Connection-establishment latency increased with all the connections were
> cached. Hence, adapting the pooling heuristic of JDK netlib,
> 2. In PoolingHttpClientConnectionManager, it expects the application to
> provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as
> the total number of connections it can create. For application using ABFS, it
> is not feasible to provide a value in the initialisation of the
> connectionManager. JDK's implementation has no cap on the number of
> connections it can have opened on a moment. Hence, adapting the pooling
> heuristic of JDK netlib,
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]