[
https://issues.apache.org/jira/browse/ZOOKEEPER-1768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781196#comment-13781196
]
yuxin.yan commented on ZOOKEEPER-1768:
--------------------------------------
Firstly, thanks for your attention. May be i haven't explained the problem
clearly. The problem is like ZOOKEEPER-1115. I copy the log below:
2013-09-27 15:18:43,172 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot
/data/zookeeper/version-2/snapshot.200a74d3b
2013-09-27 15:19:10,358 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New
election. My id = 4, proposed zxid=0x200a74d3b
2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 4 (n.leader), 0x200a74d3b (n.zxid), 0x3 (n.round), LOOKING
(n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING
(n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,359 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING
(n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:19:10,360 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING
2013-09-27 15:19:10,360 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2013-09-27 15:19:10,360 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER
ELECTION TOOK - 27191
2013-09-27 15:19:10,363 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the
leader 0x200a74e6d
2013-09-27 15:19:10,364 - WARN
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74d3c
expected 0x1
2013-09-27 15:19:10,376 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting:
0x200a74e6d to /data/zookeeper/version-2/snapshot.200a74e6d
2013-09-27 15:19:13,935 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 1 (n.leader), 0x200a74d48 (n.zxid), 0x3 (n.round), LOOKING
(n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state)
2013-09-27 15:19:35,856 - WARN
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-09-27 15:19:35,856 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-09-27 15:19:35,857 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
Shutting down
2013-09-27 15:19:35,857 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
2013-09-27 15:19:35,857 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
2013-09-27 15:19:35,861 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading snapshot
/data/zookeeper/version-2/snapshot.200a74e6d
2013-09-27 15:19:42,387 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /192.168.65.18:35227
2013-09-27 15:19:42,457 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2013-09-27 15:19:42,457 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /192.168.65.18:35227 (no session established for client)
2013-09-27 15:19:55,880 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /192.168.65.16:39338
2013-09-27 15:19:55,881 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - Exception
causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not
running
2013-09-27 15:19:55,881 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /192.168.65.16:39338 (no session established for client)
2013-09-27 15:19:57,588 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /127.0.0.1:53888
2013-09-27 15:19:57,589 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0,
likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
2013-09-27 15:19:57,589 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /127.0.0.1:53888 (no session established for client)
2013-09-27 15:20:02,939 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - New
election. My id = 4, proposed zxid=0x200a74e6d
2013-09-27 15:20:02,939 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 4 (n.leader), 0x200a74e6d (n.zxid), 0x3 (n.round), LOOKING
(n.state), 4 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), LEADING
(n.state), 2 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING
(n.state), 3 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 2 (n.leader), 0x10015588a (n.zxid), 0x2 (n.round), FOLLOWING
(n.state), 5 (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)
2013-09-27 15:20:02,940 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:QuorumPeer@738] - FOLLOWING
2013-09-27 15:20:02,941 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - Created server
with tickTime 2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/data/zookeeper/version-2 snapdir /data/zookeeper/version-2
2013-09-27 15:20:02,941 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@63] - FOLLOWING - LEADER
ELECTION TOOK - 27083
2013-09-27 15:20:02,944 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@322] - Getting a diff from the
leader 0x200a74f72
2013-09-27 15:20:02,944 - WARN
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Learner@373] - Got zxid 0x200a74e6e
expected 0x1
2013-09-27 15:20:02,956 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - Snapshotting:
0x200a74f72 to /data/zookeeper/version-2/snapshot.200a74f72
2013-09-27 15:20:03,366 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted
socket connection from /192.168.65.16:39411
2013-09-27 15:20:03,367 - WARN
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@349] - caught end of
stream exception
EndOfStreamException: Unable to read additional data from client sessionid 0x0,
likely client has closed socket
at
org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:220)
at
org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
at java.lang.Thread.run(Thread.java:662)
2013-09-27 15:20:03,367 - INFO
[NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - Closed socket
connection for client /192.168.65.16:39411 (no session established for client)
2013-09-27 15:20:17,792 - INFO [WorkerReceiver[myid=4]:FastLeaderElection@542]
- Notification: 1 (n.leader), 0x200a74e7c (n.zxid), 0x3 (n.round), LOOKING
(n.state), 1 (n.sid), 0x2 (n.peerEPoch), FOLLOWING (my state)
2013-09-27 15:20:28,825 - WARN
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when
following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:152)
at
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:366)
at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:82)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:740)
2013-09-27 15:20:28,826 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:744)
2013-09-27 15:20:28,826 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
Shutting down
2013-09-27 15:20:28,826 - INFO
[QuorumPeer[myid=4]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - shutting down
2013-09-27 15:2
That means the node always elect the leader and sync from the leader, and the
number of node 's snapshots gets larger and larger, so the disk space is
destined full. So now could you give me any help? Thanks!
yyx,
> Cluster fails election loop until the device is full
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1768
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1768
> Project: ZooKeeper
> Issue Type: Bug
> Components: leaderElection
> Affects Versions: 3.4.5
> Reporter: yuxin.yan
> Attachments: zk_debug.log.2013-09-25.log, zoo.cfg
>
>
> Hi,
> I have a five nodes cluster versioned 3.4.5 and now i find one node is
> offline.
> Firstly i restart the node but i find that "Error contacting service. It is
> probably not running." and i find that the node always elect the leader and
> always sync the snapshot logs and the device will be full every ten mins.
> so could someone help me? i will put the log and zoo.cfg in the attachment.
> Thanks all.
> yyx,
--
This message was sent by Atlassian JIRA
(v6.1#6144)