[
https://issues.apache.org/jira/browse/ZOOKEEPER-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15131739#comment-15131739
]
Flavio Junqueira commented on ZOOKEEPER-2247:
---------------------------------------------
[~rakesh_r] Thanks for the clarification, but I'm still finding the predicates
a bit confusing, please bear with me. {{isRunning()}} should return true if the
server is running and the main loop should keep going as long as the call to
{{isRunning()}} returns true. If there is an error in one of the processors,
then the server isn't really running and we want the main loop to exit if the
server isn't running.
I proposed {{isStateRunning}} before because in the shutdown methods you
pointed out above for learner, observer, and RO we need to know if the server
needs shutdown or not. However, it sounds like it would be better to have a
call like {{needsShutdown()}} instead of {{isStateRunning}}, which looks like
{{return state == State.RUNNING || state == State.ERROR}}. The method
{{isRunning()}} should go back to {{state == State.RUNNING}}.
Let me know if this makes sense.
> Zookeeper service becomes unavailable when leader fails to write transaction
> log
> --------------------------------------------------------------------------------
>
> Key: ZOOKEEPER-2247
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2247
> Project: ZooKeeper
> Issue Type: Bug
> Affects Versions: 3.5.0
> Reporter: Arshad Mohammad
> Assignee: Arshad Mohammad
> Priority: Critical
> Fix For: 3.4.9, 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-2247-01.patch, ZOOKEEPER-2247-02.patch,
> ZOOKEEPER-2247-03.patch, ZOOKEEPER-2247-04.patch, ZOOKEEPER-2247-05.patch,
> ZOOKEEPER-2247-06.patch, ZOOKEEPER-2247-07.patch, ZOOKEEPER-2247-09.patch,
> ZOOKEEPER-2247-10.patch, ZOOKEEPER-2247-11.patch, ZOOKEEPER-2247-12.patch,
> ZOOKEEPER-2247-b3.5.patch
>
>
> Zookeeper service becomes unavailable when leader fails to write transaction
> log. Bellow are the exceptions
> {code}
> 2015-08-14 15:41:18,556 [myid:100] - ERROR
> [SyncThread:100:ZooKeeperCriticalThread@48] - Severe unrecoverable error,
> from thread : SyncThread:100
> java.io.IOException: Input/output error
> at sun.nio.ch.FileDispatcherImpl.force0(Native Method)
> at sun.nio.ch.FileDispatcherImpl.force(FileDispatcherImpl.java:76)
> at sun.nio.ch.FileChannelImpl.force(FileChannelImpl.java:376)
> at
> org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:331)
> at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:380)
> at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:563)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:178)
> at
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:113)
> 2015-08-14 15:41:18,559 [myid:100] - INFO
> [SyncThread:100:ZooKeeperServer$ZooKeeperServerListenerImpl@500] - Thread
> SyncThread:100 exits, error code 1
> 2015-08-14 15:41:18,559 [myid:100] - INFO
> [SyncThread:100:ZooKeeperServer@523] - shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO
> [SyncThread:100:SessionTrackerImpl@232] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO
> [SyncThread:100:LeaderRequestProcessor@77] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO
> [SyncThread:100:PrepRequestProcessor@1035] - Shutting down
> 2015-08-14 15:41:18,560 [myid:100] - INFO
> [SyncThread:100:ProposalRequestProcessor@88] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO
> [SyncThread:100:CommitProcessor@356] - Shutting down
> 2015-08-14 15:41:18,561 [myid:100] - INFO
> [CommitProcessor:100:CommitProcessor@191] - CommitProcessor exited loop!
> 2015-08-14 15:41:18,562 [myid:100] - INFO
> [SyncThread:100:Leader$ToBeAppliedRequestProcessor@915] - Shutting down
> 2015-08-14 15:41:18,562 [myid:100] - INFO
> [SyncThread:100:FinalRequestProcessor@646] - shutdown of request processor
> complete
> 2015-08-14 15:41:18,562 [myid:100] - INFO
> [SyncThread:100:SyncRequestProcessor@191] - Shutting down
> 2015-08-14 15:41:18,563 [myid:100] - INFO [ProcessThread(sid:100
> cport:-1)::PrepRequestProcessor@159] - PrepRequestProcessor exited loop!
> {code}
> After this exception Leader server still remains leader. After this non
> recoverable exception the leader should go down and let other followers
> become leader.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)