[
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15395335#comment-15395335
]
Daniel Freudenberger commented on ZOOKEEPER-2104:
-------------------------------------------------
[~fpj] of course I read through the comments. Zookeeper recovered after ~15
minutes. 20 minutes later (right now) it crashed again and flooding the log
file with following errors:
2016-07-27 11:49:39,829 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /10.41.199.233:60522 (no session established for client)
2016-07-27 11:49:39,864 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.41.199.201:60524
2016-07-27 11:49:39,865 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2016-07-27 11:49:39,865 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /10.41.199.201:60524 (no session established for client)
2016-07-27 11:49:40,095 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.41.199.217:37339
2016-07-27 11:49:40,096 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2016-07-27 11:49:40,098 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /10.41.199.217:37339 (no session established for client)
2016-07-27 11:49:40,245 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.41.199.63:33360
2016-07-27 11:49:40,245 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2016-07-27 11:49:40,245 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /10.41.199.63:33360 (no session established for client)
2016-07-27 11:49:40,317 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /10.41.199.111:34965
2016-07-27 11:49:40,320 [myid:2] - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2016-07-27 11:49:40,320 [myid:2] - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /10.41.199.111:34965 (no session established for client)
2016-07-27 11:49:40,346 [myid:2] - WARN
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at
org.apache.zookeeper.server.quorum.Learner.registerWithLeader(Learner.java:272)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:72)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2016-07-27 11:49:40,347 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2016-07-27 11:49:40,347 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
Shutting down
2016-07-27 11:49:40,347 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
2016-07-27 11:49:40,348 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2016-07-27 11:49:40,352 [myid:2] - INFO
[QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot
/var/lib/zookeeper/version-2/snapshot.3300799266
The size of the snapshot (/var/lib/zookeeper/version-2/snapshot.3300799266) is
147 mb. Not sure is this is considered "large" for zookeeper. initLimit is set
to 10, tickTime is 2000.
What else can I provide?
> Sudden crash of all nodes in the cluster
> ----------------------------------------
>
> Key: ZOOKEEPER-2104
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.6
> Reporter: Benjamin Jaton
> Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN [SyncThread:1:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
> following the leader
> java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:392)
> at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
> at
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
> at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
> at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE
> /204.53.107.249:43402 ********
> 2015-01-04 16:18:21,905 [myid:2] - WARN
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN
> [LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE
> /204.53.107.247:45953 ********
> 2015-01-04 16:18:21,918 [myid:2] - WARN
> [LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring
> unexpected exception
> java.lang.InterruptedException
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
> at
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
> at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
> 2015-01-04 16:18:23,003 [myid:2] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:23,007 [myid:2] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
> 2015-01-04 16:18:23,115 [myid:2] - WARN
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
> running
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)