[
https://issues.apache.org/jira/browse/HADOOP-15129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16366892#comment-16366892
]
Steve Loughran commented on HADOOP-15129:
-----------------------------------------
Could always add a specific failure call and wiki link here
> Datanode caches namenode DNS lookup failure and cannot startup
> --------------------------------------------------------------
>
> Key: HADOOP-15129
> URL: https://issues.apache.org/jira/browse/HADOOP-15129
> Project: Hadoop Common
> Issue Type: Bug
> Components: ipc
> Affects Versions: 2.8.2
> Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
> Reporter: Karthik Palaniappan
> Assignee: Karthik Palaniappan
> Priority: Minor
> Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>
> On startup, the Datanode creates an InetSocketAddress to register with each
> namenode. Though there are retries on connection failure throughout the
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an
> InetSocketAddress in some cases just in case the remote IP has changed for a
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode
> for null remains unresolved for ID null. Check your hdfs-site.xml file to
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Starting BPOfferServices for nameservices: <default>
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
> Block pool <registering> (Datanode Uuid unassigned) service to
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but
> the culprit is actually in IPC Client:
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the
> constructor and into a place that will trigger HADOOP-7472's logic to
> re-resolve addresses. If the DNS failure was temporary, this will allow the
> connection to succeed. If not, the connection will fail after ipc client
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as
> this fixes temporary DNS issues for all of Hadoop.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]