[
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Frantisek Vacek updated HDFS-7392:
----------------------------------
Description:
In some specific circumstances,
org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts
and last forever.
What are specific circumstances:
1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point
to valid IP address but without name node service running on it.
2) There should be at least 2 IP addresses for such a URI. See output below:
{quote}
[~/proj/quickbox]$ nslookup share.example.com
Server: 127.0.1.1
Address: 127.0.1.1#53
share.example.com canonical name =
internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 192.168.1.223
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 192.168.1.65
{quote}
In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress()
returns sometimes true (even if address didn't actually changed see img. 1) and
the timeoutFailures counter is set to 0 (see img. 2). The
maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is
repeated forever.
was:
In some specific circumstances,
org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts
and last forever.
What are specific circumstances:
1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point
to valid IP address but without name node service running on it.
2) There should be at least 2 IP addresses for such a URI. See output below:
{quote}
[~/proj/quickbox]$ nslookup share.example.com
Server: 127.0.1.1
Address: 127.0.1.1#53
share.example.com canonical name =
internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 54.40.29.223
Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
Address: 54.40.29.65
{quote}
In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress()
returns sometimes true (even if address didn't actually changed see img. 1) and
the timeoutFailures counter is set to 0 (see img. 2). The
maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is
repeated forever.
> org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
> ---------------------------------------------------------------------
>
> Key: HDFS-7392
> URL: https://issues.apache.org/jira/browse/HDFS-7392
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Reporter: Frantisek Vacek
> Priority: Critical
> Attachments: 1.png, 2.png
>
>
> In some specific circumstances,
> org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts
> and last forever.
> What are specific circumstances:
> 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point
> to valid IP address but without name node service running on it.
> 2) There should be at least 2 IP addresses for such a URI. See output below:
> {quote}
> [~/proj/quickbox]$ nslookup share.example.com
> Server: 127.0.1.1
> Address: 127.0.1.1#53
> share.example.com canonical name =
> internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
> Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.223
> Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.65
> {quote}
> In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress()
> returns sometimes true (even if address didn't actually changed see img. 1)
> and the timeoutFailures counter is set to 0 (see img. 2). The
> maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is
> repeated forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)