[ 
https://issues.apache.org/jira/browse/HADOOP-15129?focusedWorklogId=643080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-643080
 ]

ASF GitHub Bot logged work on HADOOP-15129:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 28/Aug/21 07:06
            Start Date: 28/Aug/21 07:06
    Worklog Time Spent: 10m 
      Work Description: hadoop-yetus commented on pull request #3348:
URL: https://github.com/apache/hadoop/pull/3348#issuecomment-907584767


   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |:----:|----------:|--------:|:--------:|:-------:|
   | +0 :ok: |  reexec  |   0m 57s |  |  Docker mode activated.  |
   |||| _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
   |||| _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  33m 45s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m  6s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  compile  |  19m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  trunk passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 29s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  18m 30s |  |  branch has no errors 
when building and testing our client artifacts.  |
   |||| _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m  8s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javac  |  22m  8s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  19m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  javac  |  19m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  1s |  |  the patch passed with JDK 
Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04  |
   | +1 :green_heart: |  javadoc  |   1m 35s |  |  the patch passed with JDK 
Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10  |
   | +1 :green_heart: |  spotbugs  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  18m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
   |||| _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 26s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 49s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 191m 49s |  |  |
   
   
   | Subsystem | Report/Notes |
   |----------:|:-------------|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3348/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/3348 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell |
   | uname | Linux 4f48b63252a5 4.15.0-147-generic #151-Ubuntu SMP Fri Jun 18 
19:21:19 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / c8d5b06ff8c2e7629d0bdf5713b7cf35f87bdad6 |
   | Default Java | Private Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.11+9-Ubuntu-0ubuntu2.20.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_292-8u292-b10-0ubuntu1~20.04-b10 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3348/1/testReport/ |
   | Max. process+thread count | 2539 (vs. ulimit of 5500) |
   | modules | C: hadoop-common-project/hadoop-common U: 
hadoop-common-project/hadoop-common |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-3348/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0-SNAPSHOT https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 643080)
    Time Spent: 20m  (was: 10m)

> Datanode caches namenode DNS lookup failure and cannot startup
> --------------------------------------------------------------
>
>                 Key: HADOOP-15129
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15129
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 2.8.2
>         Environment: Google Compute Engine.
> I'm using Java 8, Debian 8, Hadoop 2.8.2.
>            Reporter: Karthik Palaniappan
>            Assignee: Chris Nauroth
>            Priority: Minor
>              Labels: pull-request-available
>         Attachments: HADOOP-15129.001.patch, HADOOP-15129.002.patch
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> On startup, the Datanode creates an InetSocketAddress to register with each 
> namenode. Though there are retries on connection failure throughout the 
> stack, the same InetSocketAddress is reused.
> InetSocketAddress is an interesting class, because it resolves DNS names to 
> IP addresses on construction, and it is never refreshed. Hadoop re-creates an 
> InetSocketAddress in some cases just in case the remote IP has changed for a 
> particular DNS name: https://issues.apache.org/jira/browse/HADOOP-7472.
> Anyway, on startup, you cna see the Datanode log: "Namenode...remains 
> unresolved" -- referring to the fact that DNS lookup failed.
> {code:java}
> 2017-11-02 16:01:55,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Refresh request received for nameservices: null
> 2017-11-02 16:01:55,153 WARN org.apache.hadoop.hdfs.DFSUtilClient: Namenode 
> for null remains unresolved for ID null. Check your hdfs-site.xml file to 
> ensure namenodes are configured properly.
> 2017-11-02 16:01:55,156 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Starting BPOfferServices for nameservices: <default>
> 2017-11-02 16:01:55,169 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Block pool <registering> (Datanode Uuid unassigned) service to 
> cluster-32f5-m:8020 starting to offer service
> {code}
> The Datanode then proceeds to use this unresolved address, as it may work if 
> the DN is configured to use a proxy. Since I'm not using a proxy, it forever 
> prints out this message:
> {code:java}
> 2017-12-15 00:13:40,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:45,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:50,712 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:13:55,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> 2017-12-15 00:14:00,713 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> Problem connecting to server: cluster-32f5-m:8020
> {code}
> Unfortunately, the log doesn't contain the exception that triggered it, but 
> the culprit is actually in IPC Client: 
> https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Client.java#L444.
> This line was introduced in https://issues.apache.org/jira/browse/HADOOP-487 
> to give a clear error message when somebody mispells an address.
> However, the fix in HADOOP-7472 doesn't apply here, because that code happens 
> in Client#getConnection after the Connection is constructed.
> My proposed fix (will attach a patch) is to move this exception out of the 
> constructor and into a place that will trigger HADOOP-7472's logic to 
> re-resolve addresses. If the DNS failure was temporary, this will allow the 
> connection to succeed. If not, the connection will fail after ipc client 
> retries (default 10 seconds worth of retries).
> I want to fix this in ipc client rather than just in Datanode startup, as 
> this fixes temporary DNS issues for all of Hadoop.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to