Jon,
 Whats the size of the snapshot?

And what are the configs for:

1) initLimit
2) syncLimit
3) tickTime
?

thanks
mahadev

On Mon, Oct 24, 2011 at 11:09 AM, Jon King <[email protected]> wrote:

> Hi All,
>
> It looks like one of our ZK quorum servers cannot sync with the leader
> anymore.  The leader logs show "Read timed out" errors and the follower is
> showing a "Broken pipe" at the same time.
>
> Follower logs
>
> 2011-10-24 11:53:23,110 - INFO  [QuorumPeer:/0.0.0.0:2181:FileSnap@82] -
> Reading snapshot /var/zookeeper/version-2/snapshot.10000de07
> 2011-10-24 11:53:32,792 - WARN  [QuorumPeer:/0.0.0.0:2181:QuorumPeer@497]
> - Unable to load database
> java.io.IOException: Transaction log:
> /var/zookeeper/version-2/log.10000de08 has invalid magic number 0 !=
> 1514884167
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.inStreamCreated(FileTxnLog.java:510)
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.createInputArchive(FileTxnLog.java:527)
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.goToNextLog(FileTxnLog.java:493)
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.init(FileTxnLog.java:475)
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog$FileTxnIterator.<init>(FileTxnLog.java:454)
>         at
> org.apache.zookeeper.server.persistence.FileTxnLog.read(FileTxnLog.java:325)
>         at
> org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:126)
>         at
> org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:222)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.getLastLoggedZxid(QuorumPeer.java:493)
>         at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:69)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2011-10-24 11:53:32,793 - INFO  [QuorumPeer:/0.0.0.0:2181:Learner@294] -
> Getting a snapshot from leader
> 2011-10-24 11:54:19,716 - INFO  [QuorumPeer:/0.0.0.0:2181:Learner@325] -
> Setting leader epoch 1
> 2011-10-24 11:54:19,717 - INFO  [QuorumPeer:/0.0.0.0:2181
> :FileTxnSnapLog@208] - Snapshotting: 10000de0d
> 2011-10-24 11:54:44,412 - WARN  [QuorumPeer:/0.0.0.0:2181:Follower@82] -
> Exception when following the leader
> java.net.SocketException: Broken pipe
>         at java.net.SocketOutputStream.socketWrite0(Native Method)
>         at java.net.SocketOutputStream.socketWrite(Unknown Source)
>         at java.net.SocketOutputStream.write(Unknown Source)
>         at java.io.BufferedOutputStream.flushBuffer(Unknown Source)
>         at java.io.BufferedOutputStream.flush(Unknown Source)
>         at
> org.apache.zookeeper.server.quorum.Learner.writePacket(Learner.java:134)
>         at
> org.apache.zookeeper.server.quorum.Learner.ping(Learner.java:418)
>         at
> org.apache.zookeeper.server.quorum.Follower.processPacket(Follower.java:108)
>         at
> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:79)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
> 2011-10-24 11:54:45,784 - INFO  [QuorumPeer:/0.0.0.0:2181:Follower@165] -
> shutdown called
> java.lang.Exception: shutdown Follower
>         at
> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
>         at
> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
> 2011-10-24 11:54:45,785 - INFO  [QuorumPeer:/0.0.0.0:2181
> :FinalRequestProcessor@378] - shutdown of request processor complete
>
>
> Leader Logs
>
> 2011-10-24 11:53:13,626 - INFO  [WorkerReceiver
> Thread:FastLeaderElection@496] - Notification: 3 (n.leader), -1 (n.zxid),
> 2 (n.round), LOOKING (n.state), 3 (n.sid), LEADING (my state)
> 2011-10-24 11:53:23,109 - INFO  [LearnerHandler-/10.3.4.156:41450
> :LearnerHandler@249] - Follower sid: 3 : info :
> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@783c342b
> 2011-10-24 11:53:23,109 - INFO  [LearnerHandler-/10.3.4.156:41450
> :LearnerHandler@319] - Sending snapshot last zxid of peer is
> 0xffffffffffffffff  zxid of leader is 0x10000de0dsent zxid of db as
> 0x10000de0d
> 2011-10-24 11:54:29,713 - ERROR [LearnerHandler-/10.3.4.156:41450
> :LearnerHandler@461] - Unexpected exception causing shutdown while sock
> still open
> java.net.SocketTimeoutException: Read timed out
>         at java.net.SocketInputStream.socketRead0(Native Method)
>         at java.net.SocketInputStream.read(Unknown Source)
>         at java.io.BufferedInputStream.fill(Unknown Source)
>         at java.io.BufferedInputStream.read(Unknown Source)
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>         at
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
>         at
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>         at
> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:357)
> 2011-10-24 11:54:29,714 - WARN  [LearnerHandler-/10.3.4.156:41450
> :LearnerHandler@474] - ******* GOODBYE /10.3.4.156:41450 ********
> 2011-10-24 11:54:49,634 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /10.2.16.131:41048
> 2011-10-24 11:54:49,635 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing ruok command from /
> 10.2.16.131:41048
> 2011-10-24 11:54:49,635 - INFO  [Thread-7102:NIOServerCnxn@1435] - Closed
> socket connection for client /10.2.16.131:41048 (no session established
> for client)
> 2011-10-24 11:54:49,661 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn$Factory@251] - Accepted socket
> connection from /10.2.16.131:41052
> 2011-10-24 11:54:49,661 - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1237] - Processing stat command from /
> 10.2.16.131:41052
> 2011-10-24 11:54:49,661 - INFO  [Thread-7103:NIOServerCnxn$StatCommand@1153]
> - Stat command output
> 2011-10-24 11:54:49,662 - INFO  [Thread-7103:NIOServerCnxn@1435] - Closed
> socket connection for client /10.2.16.131:41052 (no session established
> for client)
>
> --
>
> *Jon King*
>
> Database Administrator
>
> Office:   303.228.5108
>
> Mobile:  303.810.8200
>
> [email protected]
>
>
>
>
>
>
>
> [image: Description: 
> Photobucket_Logo_Blue_Opacity_email]<http://www.photobucket.com/>
> *
> **Photobucket.com* <http://photobucket.com/register.php>
>
>

Reply via email to