[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15058891#comment-15058891
 ] 

Flavio Junqueira commented on ZOOKEEPER-2104:
---------------------------------------------

The errors in this case don't say much, just that the server can't read/write 
from the socket. Is there any issue with your disks? Have you checked the disk 
traffic around the time this happened? I'm assumed this happened once, but let 
me know if this is reproducible.

> Sudden crash of all nodes in the cluster
> ----------------------------------------
>
>                 Key: ZOOKEEPER-2104
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2104
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.6
>            Reporter: Benjamin Jaton
>         Attachments: zookeeper-errors.txt, zookeeper-warns.txt
>
>
> In a 3 nodes ensemble, suddenly all the nodes seem to fail, displaying 
> "ZooKeeper is not running" messages.
> Not retry seems to be happening after that.
> This a request to understand what happened and probably to improve the logs 
> when it does.
> See logs below:
> NODE1:
> -- no log for several days before this --
> 2015-01-04 16:18:22,259 [myid:1] - WARN  [SyncThread:1:FileTxnLog@321] - 
> fsync-ing the write ahead log in SyncThread:1 took 11024ms which will 
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2015-01-04 16:18:22,380 [myid:1] - WARN  
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:23,384 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,492 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:24,060 [myid:1] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE2:
> -- no log for several days before this --
> 2015-01-04 16:18:21,899 [myid:3] - WARN  
> [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when 
> following the leader
> java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>         at 
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
>         at 
> org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
>         at 
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
>         at 
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
> 2015-01-04 16:18:22,760 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,801 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:22,886 [myid:3] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> NODE3 (leader):
> -- no log for several days before this --
> 2015-01-04 16:18:21,897 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,898 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.249:43402:LearnerHandler@646] - ******* GOODBYE 
> /204.53.107.249:43402 ********
> 2015-01-04 16:18:21,905 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:LearnerHandler@687] - Closing 
> connection to peer due to transaction timeout.
> 2015-01-04 16:18:21,907 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.247:45953:LearnerHandler@646] - ******* GOODBYE 
> /204.53.107.247:45953 ********
> 2015-01-04 16:18:21,918 [myid:2] - WARN  
> [LearnerHandler-/204.53.107.247:45953:LearnerHandler@658] - Ignoring 
> unexpected exception
> java.lang.InterruptedException
>         at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1219)
>         at 
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:340)
>         at 
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:338)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHandler.java:656)
>         at 
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:649)
> 2015-01-04 16:18:23,003 [myid:2] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,007 [myid:2] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running
> 2015-01-04 16:18:23,115 [myid:2] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception 
> causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not 
> running



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to