[ https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14708503#comment-14708503 ]
Raul Gutierrez Segales commented on ZOOKEEPER-2247: --------------------------------------------------- Generally, lgtm. A few nits: * In src/java/systest/org/apache/zookeeper/test/system/NonRecoverableErrorTests.java, testZookeeperSericeShouldBeAvailableEvenAfterNonRecoverableErrorOnLeader could probably just be named testZooKeeperServiceAvailableOnLeader (or something along that length...) * In TestUtils, there's a typo in deleteFileRecusively * also about TestUtils.deleteFileRecursively - if you grep for recursiveDelete you'll see that a few tests have their own (repeated) definition as well: {code} src/java/systest/org/apache/zookeeper/test/system/QuorumPeerInstance.java src/java/test/org/apache/zookeeper/test/ClientBase.java src/java/test/org/apache/zookeeper/server/quorum/LearnerTest.java src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java {code} * and that these are used across multiple test files... Can we have everyone using TestUtils.deleteFileRecursively while we are at it? It would be easier to clean it now than to do it in another patch... * given that QuorumPeer.setConfigFileName is protected, maybe it's easier/cleaner to extend QuorumPeer and add a public setter for configFileName. The problem with using reflection if it breaks, we'd only know after running CI.. whereas otherwise compilation would fail and we'd know sooner. > Zookeeper service becomes unavailable when leader fails to write transaction > log > -------------------------------------------------------------------------------- > > Key: ZOOKEEPER-2247 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.5.0 > Reporter: Arshad Mohammad > Assignee: Arshad Mohammad > Priority: Critical > Fix For: 3.5.2 > > Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch > > > Zookeeper service becomes unavailable when leader fails to write transaction > log. Bellow are the exceptions > {code} > 2015-08-14 15:41:18,556 [myid:100] - ERROR > [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error, > from thread : SyncThread:100 > java.io.IOException: Input/output error > at sun.nio.ch.FileDispatcherImpl.force0(Native Method) > at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76) > at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376) > at > org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331) > at > org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380) > at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563) > at > org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178) > at > org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113) > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread > SyncThread:100 exits, error code 1 > 2015-08-14 15:41:18,559 [myid:100] - INFO > [SyncThread:100:ZooKeeperServer@523] - shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:SessionTrackerImpl@232] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:LeaderRequestProcessor@77] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:PrepRequestProcessor@1035] - Shutting down > 2015-08-14 15:41:18,560 [myid:100] - INFO > [SyncThread:100:ProposalRequestProcessor@88] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [SyncThread:100:CommitProcessor@356] - Shutting down > 2015-08-14 15:41:18,561 [myid:100] - INFO > [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop! > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor > complete > 2015-08-14 15:41:18,562 [myid:100] - INFO > [SyncThread:100:SyncRequestProcessor@191] - Shutting down > 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100 > cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop! > {code} > After this exception Leader server still remains leader. After this non > recoverable exception the leader should go down and let other followers > become leader. -- This message was sent by Atlassian JIRA (v6.3.4#6332)