Yongcheng Liu created ZOOKEEPER-2412:
----------------------------------------
Summary: leader zk out of memory, and leader db lastZxid is not
update when process set data.
Key: ZOOKEEPER-2412
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2412
Project: ZooKeeper
Issue Type: Bug
Environment: 3 zk, the follower (myid=6) server is injected into the
network delay fault for test, so this follower(id=6) data is behind leader,
leader need send SNAP to this follower every time.
Reporter: Yongcheng Liu
the time begin to have problem is 20:59:47
(1) out of memory log:
2016-03-24 23:01:21,355 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@330] - Follower sid: 6 :
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5cf112b0
2016-03-24 23:01:21,355 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 23:01:21,355 [myid:4] - WARN
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@446] - Unhandled proposal
scenario
2016-03-24 23:01:21,355 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@462] - Sending SNAP
2016-03-24 23:01:21,893 [myid:4] - INFO
[NIOServerCxn.Factory:/192.168.50.24:10540:NIOServerCnxnFactory@207] - Current
connection (from /192.168.50.22 Cnxns = 4; totalCnxns = 15)
2016-03-24 23:01:22,625 [myid:4] - WARN
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection
request from old client /192.168.50.22:49695; will be dropped if server is in
r-o mode
2016-03-24 23:01:23,283 [myid:4] - INFO
[QuorumPeer[myid=4]/192.168.50.24:10540:Leader@493] - Shutting down
2016-03-24 23:01:24,102 [myid:4] - INFO
[QuorumPeer[myid=4]/192.168.50.24:10540:Leader@499] - Shutdown called
2016-03-24 23:01:24,040 [myid:4] - INFO [SessionTracker:ZooKeeperServer@347] -
Expiring session 0x453a6dc5b7a007e, timeout of 3500ms exceeded
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler
in thread "NIOServerCxn.Factory:/192.168.50.24:10540"
2016-03-24 23:01:25,001 [myid:4] - WARN
[QuorumPeer[myid=4]/192.168.50.24:10540:QuorumPeer@827] - QuorumPeer main
thread exited
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler
in thread "LearnerHandler-/192.168.50.26:35112"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler
in thread "QuorumPeer[myid=4]/192.168.50.24:10540"
2016-03-24 23:01:24,227 [myid:4] - ERROR
[LearnerHandler-/192.168.50.26:35112:NIOServerCnxnFactory$1@44] - Thread
LearnerHandler Socket[addr=/192.168.50.26,port=35112,localport=10550]
tickOfNextAckDeadline:38310 synced?:true queuedPacketLength:7910 died
2016-03-24 23:01:28,492 [myid:4] - INFO [main:QuorumPeerMain@93] - Exiting
normally
(2) this leader is very strange, db lastzxid not update, log as follow (after
grep "Synchronizing with"), we can see max commit zxid from leader is not
update any more. From the beginning of 20:59:47, leader lastZxid not update.
2016-03-24 20:56:10,266 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:48439:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x90000333f minCommittedLog=0x90000314b
peerLastZxid=0x90000280a
2016-03-24 20:57:59,203 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:46956:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x900003398 minCommittedLog=0x9000031a4
peerLastZxid=0x90000280a
2016-03-24 20:59:47,928 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:20601:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:01:26,475 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:29622:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:03:16,552 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:35717:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:03:44,427 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:48197:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:05:01,125 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:57826:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:06:54,187 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:30137:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:07:42,780 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:24255:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:08:41,279 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:40909:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:10:23,137 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:64166:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:11:50,003 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:56070:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:12:11,956 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:41423:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:13:08,286 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:26757:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:13:59,960 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:62785:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:15:41,103 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:53141:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:16:11,125 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:39551:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:17:25,541 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:24638:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:18:25,039 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:54723:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
2016-03-24 21:19:04,148 [myid:4] - INFO
[LearnerHandler-/192.168.50.26:37450:LearnerHandler@385] - Synchronizing with
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad
peerLastZxid=0x90000280a
(3) we can see leader Snapshotting to the same file in different time, but
receive different TxnZxid(0x900003b70 and 0x9000049dd), this show leader has
not been updated lastZxid in db.
Snapshotting 1:
2016-03-24 21:33:27,214 [myid:4] - INFO [Snapshot Thread:FileTxnSnapLog@253] -
Snapshotting: 0x9000033a1 to
/opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 21:33:27,333 [myid:4] - INFO [SyncThread:4:FileTxnLog@199] -
Creating new log file: log.900003b70
Snapshotting 2:
2016-03-24 22:41:26,601 [myid:4] - INFO [Snapshot Thread:FileTxnSnapLog@253] -
Snapshotting: 0x9000033a1 to
/opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 22:41:26,662 [myid:4] - INFO [SyncThread:4:FileTxnLog@199] -
Creating new log file: log.9000049dd
(4) finally, this node(leader server) zxid is behind zk c client, log as follow:
2016-03-24 23:00:53,712 [myid:4] - WARN
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection
request from old client /192.168.50.23:35043; will be dropped if server is in
r-o mode
2016-03-24 23:00:53,713 [myid:4] - INFO
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@851] - Refusing
session request for client /192.168.50.23:35043 as it has seen zxid 0x900004e1f
our last zxid is 0x9000033a1 client must try another server
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)