[
https://issues.apache.org/jira/browse/HDFS-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14250253#comment-14250253
]
Frantisek Vacek commented on HDFS-7392:
---------------------------------------
I'm too bussy to implement promissed patch so I'm adding part of log to show
what is wrong with connection timeout. Please let me know if it helped.
Fanda
Everlasting attempt to open nonexisting hdfs uri
hdfs://share.merck.com/OneLevelHeader.xlsx
opening path: /OneLevelHeader.xlsx ...
DEBUG [main] (Client.java:426) - The ping interval is 60000 ms.
DEBUG [main] (Client.java:695) - Connecting to share.merck.com/54.40.29.223:8020
INFO [main] (Client.java:814) - Retrying connect to server:
share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45
WARN [main] (Client.java:568) - Address change detected. Old:
share.merck.com/54.40.29.223:8020 New: share.merck.com/54.40.29.65:8020
INFO [main] (Client.java:814) - Retrying connect to server:
share.merck.com/54.40.29.65:8020. Already tried 0 time(s); maxRetries=45
INFO [main] (Client.java:814) - Retrying connect to server:
share.merck.com/54.40.29.65:8020. Already tried 1 time(s); maxRetries=45
WARN [main] (Client.java:568) - Address change detected. Old:
share.merck.com/54.40.29.65:8020 New: share.merck.com/54.40.29.223:8020
INFO [main] (Client.java:814) - Retrying connect to server:
share.merck.com/54.40.29.223:8020. Already tried 0 time(s); maxRetries=45
INFO [main] (Client.java:814) - Retrying connect to server:
share.merck.com/54.40.29.223:8020. Already tried 1 time(s); maxRetries=45
> org.apache.hadoop.hdfs.DistributedFileSystem open invalid URI forever
> ---------------------------------------------------------------------
>
> Key: HDFS-7392
> URL: https://issues.apache.org/jira/browse/HDFS-7392
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: hdfs-client
> Reporter: Frantisek Vacek
> Assignee: Yi Liu
> Attachments: 1.png, 2.png
>
>
> In some specific circumstances,
> org.apache.hadoop.hdfs.DistributedFileSystem.open(invalid URI) never timeouts
> and last forever.
> What are specific circumstances:
> 1) HDFS URI (hdfs://share.example.com:8020/someDir/someFile.txt) should point
> to valid IP address but without name node service running on it.
> 2) There should be at least 2 IP addresses for such a URI. See output below:
> {quote}
> [~/proj/quickbox]$ nslookup share.example.com
> Server: 127.0.1.1
> Address: 127.0.1.1#53
> share.example.com canonical name =
> internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com.
> Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.223
> Name: internal-realm-share-example-com-1234.us-east-1.elb.amazonaws.com
> Address: 192.168.1.65
> {quote}
> In such a case the org.apache.hadoop.ipc.Client.Connection.updateAddress()
> returns sometimes true (even if address didn't actually changed see img. 1)
> and the timeoutFailures counter is set to 0 (see img. 2). The
> maxRetriesOnSocketTimeouts (45) is never reached and connection attempt is
> repeated forever.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)