[ https://issues.apache.org/jira/browse/HADOOP-12622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15046964#comment-15046964 ]
Hadoop QA commented on HADOOP-12622: ------------------------------------ | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 56s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 3s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 28s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 17s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 6s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 13s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 1m 59s {color} | {color:green} trunk passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 57s {color} | {color:green} trunk passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} trunk passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 39s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 0s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 0s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 9m 14s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 9m 14s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s {color} | {color:red} Patch generated 2 new checkstyle issues in hadoop-common-project/hadoop-common (total was 47, now 47). {color} | | {color:green}+1{color} | {color:green} mvnsite {color} | {color:green} 1m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 15s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 2m 4s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 56s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.7.0_85 {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 17m 8s {color} | {color:red} hadoop-common in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 8m 35s {color} | {color:green} hadoop-common in the patch passed with JDK v1.7.0_85. {color} | | {color:red}-1{color} | {color:red} asflicense {color} | {color:red} 0m 19s {color} | {color:red} Patch generated 3 ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 84m 51s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | JDK v1.8.0_66 Failed junit tests | hadoop.fs.TestLocalFsFCStatistics | | JDK v1.8.0_66 Timed out junit tests | org.apache.hadoop.http.TestHttpServerLifecycle | \\ \\ || Subsystem || Report/Notes || | Docker | Image:yetus/hadoop:0ca8df7 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12776313/HADOOP-12622-v2.patch | | JIRA Issue | HADOOP-12622 | | Optional Tests | asflicense compile javac javadoc mvninstall mvnsite unit findbugs checkstyle | | uname | Linux 3d644b8155ae 3.13.0-36-lowlatency #63-Ubuntu SMP PREEMPT Wed Sep 3 21:56:12 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | /testptch/hadoop/patchprocess/precommit/personality/provided.sh | | git revision | trunk / fc47084 | | findbugs | v3.0.0 | | checkstyle | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/artifact/patchprocess/diff-checkstyle-hadoop-common-project_hadoop-common.txt | | unit | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt | | unit test logs | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/artifact/patchprocess/patch-unit-hadoop-common-project_hadoop-common-jdk1.8.0_66.txt | | JDK v1.7.0_85 Test Results | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/testReport/ | | asflicense | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/artifact/patchprocess/patch-asflicense-problems.txt | | modules | C: hadoop-common-project/hadoop-common U: hadoop-common-project/hadoop-common | | Max memory used | 76MB | | Powered by | Apache Yetus http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HADOOP-Build/8205/console | This message was automatically generated. > RetryPolicies (other than FailoverOnNetworkExceptionRetry) should put on > retry failed reason or the log from RMProxy's retry could be very misleading. > ------------------------------------------------------------------------------------------------------------------------------------------------------ > > Key: HADOOP-12622 > URL: https://issues.apache.org/jira/browse/HADOOP-12622 > Project: Hadoop Common > Issue Type: Bug > Components: auto-failover > Affects Versions: 2.6.0, 2.7.0 > Reporter: Junping Du > Assignee: Junping Du > Priority: Critical > Attachments: HADOOP-12622-v2.patch, HADOOP-12622.patch > > > In debugging a NM retry connection to RM (non-HA), the NM log during RM down > time is very misleading: > {noformat} > 2015-12-07 11:37:14,098 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:15,099 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:16,101 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:17,103 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:18,105 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:19,107 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:20,109 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:21,112 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 7 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:22,113 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 8 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:23,115 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:54,120 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:55,121 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 1 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:56,123 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 2 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:57,125 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 3 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:58,126 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 4 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:37:59,128 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 5 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2015-12-07 11:38:00,130 INFO org.apache.hadoop.ipc.Client: Retrying connect > to server: 0.0.0.0/0.0.0.0:8031. Already tried 6 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > {noformat} > It actually only log client side retry on NetworkConnection failure but not > include any info on RetryInvocationHandler where the real retry policy works. > From the code below in RetryInvocationHandler.java, even the retry ends, we > don't put warn messages to include how much/many time/ counts we spent on > retry logic that make it harder to debug. > {code} > if (failAction != null) { > if (failAction.reason != null) { > LOG.warn("Exception while invoking " + > currentProxy.proxy.getClass() > + "." + method.getName() + " over " + currentProxy.proxyInfo > + ". Not retrying because " + failAction.reason, ex); > } > throw ex; > } > {code} > We should add failAction.reason as much as we can in multiple retry policies. > In addition, we should keep consistent in log level for message during the > retry attempts: now the ipc.client is INFO, but RetryInvocationHandler is > DEBUG (if not fail_over). We should keep them consistent or it could be very > confusing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)