[ 
https://issues.apache.org/jira/browse/HDFS-12834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16257703#comment-16257703
 ] 

Hadoop QA commented on HDFS-12834:
----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue}  2m 
24s{color} | {color:blue} Docker mode activated. {color} |
|| || || || {color:brown} Prechecks {color} ||
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  
0s{color} | {color:red} The patch doesn't appear to include any new or modified 
tests. Please justify why no new tests are needed for this patch. Also please 
list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} trunk Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 16m 
28s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
41s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  1m  
7s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
11m 55s{color} | {color:green} branch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  2m  
1s{color} | {color:green} trunk passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
58s{color} | {color:green} trunk passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
10s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
51s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
35s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvnsite {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} shadedclient {color} | {color:green} 
10m 53s{color} | {color:green} patch has no errors when building and testing 
our client artifacts. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
47s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 83m 53s{color} 
| {color:red} hadoop-hdfs in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
21s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black}137m 32s{color} | 
{color:black} {color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner |
|   | hadoop.fs.TestUnbuffer |
|   | hadoop.hdfs.server.blockmanagement.TestUnderReplicatedBlocks |
|   | hadoop.fs.viewfs.TestViewFileSystemLinkMergeSlash |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hadoop:5b98639 |
| JIRA Issue | HDFS-12834 |
| JIRA Patch URL | 
https://issues.apache.org/jira/secure/attachment/12898267/HDFS-12834.01.patch |
| Optional Tests |  asflicense  compile  javac  javadoc  mvninstall  mvnsite  
unit  shadedclient  findbugs  checkstyle  |
| uname | Linux c6b6ca1d3897 3.13.0-129-generic #178-Ubuntu SMP Fri Aug 11 
12:48:20 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /testptch/patchprocess/precommit/personality/provided.sh |
| git revision | trunk / 0940e4f |
| maven | version: Apache Maven 3.3.9 |
| Default Java | 1.8.0_151 |
| findbugs | v3.1.0-RC1 |
| unit | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22133/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
 |
|  Test Results | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22133/testReport/ |
| Max. process+thread count | 3883 (vs. ulimit of 5000) |
| modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
| Console output | 
https://builds.apache.org/job/PreCommit-HDFS-Build/22133/console |
| Powered by | Apache Yetus 0.7.0-SNAPSHOT   http://yetus.apache.org |


This message was automatically generated.



> DFSZKFailoverController on error exits with 0 error code
> --------------------------------------------------------
>
>                 Key: HDFS-12834
>                 URL: https://issues.apache.org/jira/browse/HDFS-12834
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 2.7.3, 3.0.0-alpha4
>            Reporter: Zbigniew Kostrzewa
>            Assignee: Bharat Viswanadham
>         Attachments: HDFS-12834.00.patch, HDFS-12834.01.patch
>
>
> On error {{DFSZKFailoverController}} exits with 0 return code which leads to 
> problems when integrating it with scripts and monitoring tools, e.g. systemd, 
> which when configured to restart the service only on failure does not restart 
> ZKFC because it exited with 0.
> For example, in my case, systemd reported zkfc exited with success but in 
> logs I have found this:
> {noformat}
> 2017-11-14 05:33:55,075 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 3334ms for sessionid 
> 0x15fb794bd240001, closing socket connection and attempting reconnect
> 2017-11-14 05:33:55,178 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Session disconnected. Entering neutral mode...
> 2017-11-14 05:33:55,564 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server 10.9.4.73/10.9.4.73:2182. Will not attempt to 
> authenticate using SASL (unknown error)
> 2017-11-14 05:33:55,566 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to 10.9.4.73/10.9.4.73:2182, initiating session
> 2017-11-14 05:33:55,569 INFO org.apache.zookeeper.ClientCnxn: Session 
> establishment complete on server 10.9.4.73/10.9.4.73:2182, sessionid = 
> 0x15fb794bd240001, negotiated timeout = 5000
> 2017-11-14 05:33:55,570 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Session connected.
> 2017-11-14 05:33:58,230 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
> additional data from server sessionid 0x15fb794bd240001, likely server has 
> closed socket, closing socket connection and attempting reconnect
> 2017-11-14 05:33:58,335 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Session disconnected. Entering neutral mode...
> 2017-11-14 05:33:58,402 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server 10.9.4.138/10.9.4.138:2181. Will not attempt to 
> authenticate using SASL (unknown error)
> 2017-11-14 05:33:58,403 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to 10.9.4.138/10.9.4.138:2181, initiating session
> 2017-11-14 05:33:58,406 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
> additional data from server sessionid 0x15fb794bd240001, likely server has 
> closed socket, closing socket connection and attempting reconnect
> 2017-11-14 05:33:59,218 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server 10.9.4.228/10.9.4.228:2183. Will not attempt to 
> authenticate using SASL (unknown error)
> 2017-11-14 05:33:59,219 INFO org.apache.zookeeper.ClientCnxn: Socket 
> connection established to 10.9.4.228/10.9.4.228:2183, initiating session
> 2017-11-14 05:33:59,221 INFO org.apache.zookeeper.ClientCnxn: Unable to read 
> additional data from server sessionid 0x15fb794bd240001, likely server has 
> closed socket, closing socket connection and attempting reconnect
> 2017-11-14 05:34:01,094 INFO org.apache.zookeeper.ClientCnxn: Opening socket 
> connection to server 10.9.4.73/10.9.4.73:2182. Will not attempt to 
> authenticate using SASL (unknown error)
> 2017-11-14 05:34:01,094 INFO org.apache.zookeeper.ClientCnxn: Client session 
> timed out, have not heard from server in 1773ms for sessionid 
> 0x15fb794bd240001, closing socket connection and attempting reconnect
> 2017-11-14 05:34:01,196 FATAL org.apache.hadoop.ha.ActiveStandbyElector: 
> Received stat error from Zookeeper. code:CONNECTIONLOSS. Not retrying further 
> znode monitoring connection errors.
> 2017-11-14 05:34:02,153 INFO org.apache.zookeeper.ZooKeeper: Session: 
> 0x15fb794bd240001 closed
> 2017-11-14 05:34:02,154 FATAL org.apache.hadoop.ha.ZKFailoverController: 
> Fatal error occurred:Received stat error from Zookeeper. code:CONNECTIONLOSS. 
> Not retrying further znode monitoring connection errors.
> 2017-11-14 05:34:02,154 INFO org.apache.zookeeper.ClientCnxn: EventThread 
> shut down
> 2017-11-14 05:34:05,208 INFO org.apache.hadoop.ipc.Server: Stopping server on 
> 8019
> 2017-11-14 05:34:05,487 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server listener on 8019
> 2017-11-14 05:34:05,488 INFO org.apache.hadoop.ipc.Server: Stopping IPC 
> Server Responder
> 2017-11-14 05:34:05,487 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
> Yielding from election
> 2017-11-14 05:34:05,488 INFO org.apache.hadoop.ha.HealthMonitor: Stopping 
> HealthMonitor thread
> 2017-11-14 05:34:05,490 FATAL 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController: Got a fatal error, 
> exiting now
> java.lang.RuntimeException: ZK Failover Controller failed: Received stat 
> error from Zookeeper. code:CONNECTIONLOSS. Not retrying further znode 
> monitoring connection errors.
>         at 
> org.apache.hadoop.ha.ZKFailoverController.mainLoop(ZKFailoverController.java:369)
>         at 
> org.apache.hadoop.ha.ZKFailoverController.doRun(ZKFailoverController.java:238)
>         at 
> org.apache.hadoop.ha.ZKFailoverController.access$000(ZKFailoverController.java:61)
>         at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:172)
>         at 
> org.apache.hadoop.ha.ZKFailoverController$1.run(ZKFailoverController.java:168)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:415)
>         at 
> org.apache.hadoop.ha.ZKFailoverController.run(ZKFailoverController.java:168)
>         at 
> org.apache.hadoop.hdfs.tools.DFSZKFailoverController.main(DFSZKFailoverController.java:181)
> {noformat}
> The code that seems responsible is in {{DFSZKFailoverController.java}}:
> {code}
>   public static void main(String args[])
>       throws Exception {
> ...
>     int retCode = 0;
>     try {
>       retCode = zkfc.run(parser.getRemainingArgs());
>     } catch (Throwable t) {
>       LOG.fatal("Got a fatal error, exiting now", t); 
>     }   
>     System.exit(retCode);
>   }
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to