[
https://issues.apache.org/jira/browse/ZOOKEEPER-1109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13054239#comment-13054239
]
Laxman commented on ZOOKEEPER-1109:
-----------------------------------
Reposting the comments and analysis
I've also gone through Ted's earlier response on disk full scenario.
http://www.google.co.in/url?sa=t&source=web&cd=3&ved=0CCAQFjAC&url=http%3A%2F%2Fmail-archives.apache.org%2Fmod_mbox%2Fzookeeper-user%2F201106.mbox%2F%253CBANLkTimzQjXZvDKnP6xQLF9jHfsaz6JstA%40mail.gmail.com%253E&ei=FBQETvPWIcLNrQfk75yaDA&usg=AFQjCNFTkguyxTligpz1TZBmkqe9Osz-uw
We feel, even when one of the cluster member's disk is full, we should not
interrupt the complete service from entire cluster.
*Thread dumps*
The following thread dump shows the QuorumPeerMain thread is infntely waiting
inside SyncRequestProcessor.
{noformat}
"Thread-2" prio=10 tid=0x0810a400 nid=0x1695 in Object.wait() [0xac783000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor)
at java.lang.Thread.join(Thread.java:1143)
- locked <0xb804f5e8> (a
org.apache.zookeeper.server.SyncRequestProcessor)
at java.lang.Thread.join(Thread.java:1196)
at
org.apache.zookeeper.server.SyncRequestProcessor.shutdown(SyncRequestProcessor.java:171)
at
org.apache.zookeeper.server.quorum.ProposalRequestProcessor.shutdown(ProposalRequestProcessor.java:79)
at
org.apache.zookeeper.server.PrepRequestProcessor.shutdown(PrepRequestProcessor.java:513)
at
org.apache.zookeeper.server.ZooKeeperServer.shutdown(ZooKeeperServer.java:413)
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:411)
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126)
"SyncThread:2" prio=10 tid=0xad7fd800 nid=0x4acb in Object.wait() [0xac9ba000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1)
at java.lang.Thread.join(Thread.java:1143)
- locked <0xb8030d00> (a
org.apache.zookeeper.server.quorum.QuorumPeerMain$1)
at java.lang.Thread.join(Thread.java:1196)
at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:79)
at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:24)
at java.lang.Shutdown.runHooks(Shutdown.java:79)
at java.lang.Shutdown.sequence(Shutdown.java:123)
at java.lang.Shutdown.exit(Shutdown.java:168)
- locked <0xf01ff3b0> (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:90)
at java.lang.System.exit(System.java:904)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:149)
{noformat}
*Logs*
{noformat}
2011-06-21 10:09:59,730 - FATAL [SyncThread:2:SyncRequestProcessor@148] -
Severe unrecoverable error, exiting
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:260)
at
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
at
org.apache.zookeeper.server.persistence.FileTxnLog.commit(FileTxnLog.java:305)
at
org.apache.zookeeper.server.persistence.FileTxnSnapLog.commit(FileTxnSnapLog.java:324)
at org.apache.zookeeper.server.ZKDatabase.commit(ZKDatabase.java:484)
at
org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:158)
at
org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:98)
2011-06-21 10:09:59,732 - INFO [Thread-2:QuorumPeer@691] - The Quorum server
is going for shutdown
2011-06-21 10:09:59,732 - INFO [Thread-2:Leader@393] - Shutdown called
java.lang.Exception: shutdown Leader! reason: quorum Peer shutdown
at org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:393)
at
org.apache.zookeeper.server.quorum.QuorumPeer.shutdown(QuorumPeer.java:694)
at
org.apache.zookeeper.server.quorum.QuorumPeerMain$1.run(QuorumPeerMain.java:126)
2011-06-21 10:09:59,733 - INFO [Thread-6:Leader$LearnerCnxAcceptor@243] -
exception while shutting down acceptor: java.net.SocketException: Socket closed
2011-06-21 10:09:59,758 - INFO [ProcessThread:-1:PrepRequestProcessor@120] -
PrepRequestProcessor exited loop!
2011-06-21 10:09:59,758 - INFO [CommitProcessor:2:CommitProcessor@150] -
CommitProcessor exited loop!
2011-06-21 10:09:59,759 - INFO [Thread-2:FinalRequestProcessor@379] - shutdown
of request processor complete
2011-06-21 10:10:00,000 - INFO [SessionTracker:SessionTrackerImpl@165] -
SessionTrackerImpl exited loop!
{noformat}
> Zookeeper service is down when SyncRequestProcessor meets any exception.
> ------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1109
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1109
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.3.0, 3.3.1, 3.3.2, 3.3.3
> Reporter: Laxman
> Original Estimate: 72h
> Remaining Estimate: 72h
>
> *Problem* Zookeeper is not shut down completely when dataDir disk space is
> full and ZK Cluster went into unserviceable state.
>
> *Scenario*
> If the leader zookeeper disk is made full, the zookeeper is trying to
> shutdown. But it is waiting indefinitely while shutting down the
> SyncRequestProcessor thread.
> *Root Cause*
> this.join() is invoked in the same thread where System.exit(11) has been
> triggered.
> When disk space full happens, It got the exception as follows 'No space left
> on device' and invoked System.exit(11) from the SyncRequestProcessor
> thread(The following logs shows the same). Before exiting JVM, ZK will
> execute the ShutdownHook of QuorumPeerMain and the flow comes to
> SyncRequestProcessor.shutdown(). Here this.join() is invoked in the same
> thread where System.exit(11) has been invoked.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira