[
https://issues.apache.org/jira/browse/HDFS-14594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastien Barnoud updated HDFS-14594:
-------------------------------------
Description:
When authentication is activated there is no keep-alive on http(s) connections.
That's because the JDK Http(s)URLConnection explicitly closes the connection
after the HTTP 401 that negotiate the authentication.
This lead to poor performance, especially when encryption is on.
To see the issue, simply strace and compare the number of connection between
hdfs implementation and curl:
{code:java}
$ strace -T -tt -f hdfs dfs -ls
swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep
"sin_port=htons(50470)"
[pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000157>
[pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000167>
[pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000110>
[pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
=> 6 connect{code}
{code:java}
$ strace -T -tt -f curl --negotiate -u: -v
https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS
2>&1 | grep "sin_port=htons(50470)"
15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in
progress) <0.000118>
15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008>
[bash-4.2.46][j:0|h:4961|?:0][2019-06-21 15:10:53][dtlprd05@nazare:~/test-hdfs]
=> only one connect{code}
In addition, even without encryption, too many connection are used:
{code:java}
$ strace -T -tt -f hdfs dfs -ls
webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep
"sin_port=htons(50070)"
[pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000119>
[pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
=> 3 connect{code}
Finally we have some unexplained webhdfs command that are stucked in
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):
-) for hdfs dfs commands with swebhdfs schema
-) for some TEZ job using the same implementation for the shuffle service when
encryption is on
All other services (typically RPC) are working fine on the cluster.
It really seams that Http(s)URLConnection causes some issues that Netty or
HttpClient don't have.
Regards,
was:
When authentication is activated the is no keep-alive on http(s) connections.
That's because the JDK Http(s)URLConnection explicitly closes the connection
after the HTTP 401 that negotiate the authentication.
This lead to poor performance, especially when encryption is on.
To see the issue, simply strace and compare the number of connection between
hdfs implementation and curl:
{code:java}
$ strace -T -tt -f hdfs dfs -ls
swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep
"sin_port=htons(50470)"
[pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000157>
[pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000167>
[pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000110>
[pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET,
sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
=> 6 connect{code}
{code:java}
$ strace -T -tt -f curl --negotiate -u: -v
https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS
2>&1 | grep "sin_port=htons(50470)"
15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now in
progress) <0.000118>
15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008>
[bash-4.2.46][j:0|h:4961|?:0][2019-06-21 15:10:53][dtlprd05@nazare:~/test-hdfs]
=> only one connect{code}
In addition, even without encryption, too many connection are used:
{code:java}
$ strace -T -tt -f hdfs dfs -ls
webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep
"sin_port=htons(50070)"
[pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
EINPROGRESS (Operation now in progress) <0.000119>
[pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
[pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET,
sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
...>
=> 3 connect{code}
Finally we have some unexplained webhdfs command that are stucked in
sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):
-) for hdfs dfs commands with swebhdfs schema
-) for some TEZ job using the same implementation for the shuffle service when
encryption is on
All other services (typically RPC) are working fine on the cluster.
It really seams that Http(s)URLConnection causes some issues that Netty or
HttpClient don't have.
Regards,
> Replace all Http(s)URLConnection
> --------------------------------
>
> Key: HDFS-14594
> URL: https://issues.apache.org/jira/browse/HDFS-14594
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: webhdfs
> Affects Versions: 2.7.3
> Environment: HDP 2.6.5 and HDP 2.6.2
> HotSpot 8u192 and 8u92
> Linux Redhat 3.10.0-862.14.4.el7.x86_64
> Reporter: Sebastien Barnoud
> Priority: Major
>
> When authentication is activated there is no keep-alive on http(s)
> connections.
> That's because the JDK Http(s)URLConnection explicitly closes the connection
> after the HTTP 401 that negotiate the authentication.
> This lead to poor performance, especially when encryption is on.
> To see the issue, simply strace and compare the number of connection between
> hdfs implementation and curl:
> {code:java}
> $ strace -T -tt -f hdfs dfs -ls
> swebhdfs://dtltstap009.fr.world.socgen:50470/user 2>&1 | grep
> "sin_port=htons(50470)"
> [pid 92879] 15:11:47.019865 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
> EINPROGRESS (Operation now in progress) <0.000157>
> [pid 92879] 15:11:47.182110 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
> ...>
> [pid 92879] 15:11:47.387073 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
> EINPROGRESS (Operation now in progress) <0.000167>
> [pid 92879] 15:11:47.429716 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
> ...>
> [pid 93116] 15:11:47.528073 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
> EINPROGRESS (Operation now in progress) <0.000110>
> [pid 93116] 15:11:47.566947 connect(386, {sa_family=AF_INET,
> sin_port=htons(50470), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
> ...>
> => 6 connect{code}
> {code:java}
> $ strace -T -tt -f curl --negotiate -u: -v
> https://dtltstap009.fr.world.socgen:50470/webhdfs/v1/user/?op=GETFILESTATUS
> 2>&1 | grep "sin_port=htons(50470)"
> 15:10:53.671358 connect(3, {sa_family=AF_INET, sin_port=htons(50470),
> sin_addr=inet_addr("192.163.201.117")}, 16) = -1 EINPROGRESS (Operation now
> in progress) <0.000118>
> 15:10:53.683513 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
> 15:10:53.869482 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000009>
> 15:10:53.869576 getpeername(3, {sa_family=AF_INET, sin_port=htons(50470),
> sin_addr=inet_addr("192.163.201.117")}, [16]) = 0 <0.000008>
> [bash-4.2.46][j:0|h:4961|?:0][2019-06-21
> 15:10:53][dtlprd05@nazare:~/test-hdfs]
> => only one connect{code}
>
> In addition, even without encryption, too many connection are used:
> {code:java}
> $ strace -T -tt -f hdfs dfs -ls
> webhdfs://dtltstap009.fr.world.socgen:50070/user 2>&1 | grep
> "sin_port=htons(50070)"
> [pid 99569] 15:13:13.838257 connect(386, {sa_family=AF_INET,
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16) = -1
> EINPROGRESS (Operation now in progress) <0.000119>
> [pid 99569] 15:13:13.904255 connect(386, {sa_family=AF_INET,
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
> ...>
> [pid 99635] 15:13:14.201236 connect(386, {sa_family=AF_INET,
> sin_port=htons(50070), sin_addr=inet_addr("192.163.201.117")}, 16 <unfinished
> ...>
> => 3 connect{code}
>
> Finally we have some unexplained webhdfs command that are stucked in
> sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1375):
> -) for hdfs dfs commands with swebhdfs schema
> -) for some TEZ job using the same implementation for the shuffle service
> when encryption is on
> All other services (typically RPC) are working fine on the cluster.
> It really seams that Http(s)URLConnection causes some issues that Netty or
> HttpClient don't have.
>
> Regards,
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]