Yongcheng Liu created ZOOKEEPER-2412:
----------------------------------------

             Summary: leader zk out of memory,  and leader db lastZxid is not 
update when process set data. 
                 Key: ZOOKEEPER-2412
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2412
             Project: ZooKeeper
          Issue Type: Bug
         Environment: 3 zk, the follower (myid=6) server is injected into the 
network delay fault for test, so this follower(id=6) data is behind leader, 
leader need send SNAP to this follower every time.
            Reporter: Yongcheng Liu


the time begin to have problem is 20:59:47

(1) out of memory log:

2016-03-24 23:01:21,355 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@330] - Follower sid: 6 : 
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5cf112b0
2016-03-24 23:01:21,355 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 23:01:21,355 [myid:4] - WARN  
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@446] - Unhandled proposal 
scenario
2016-03-24 23:01:21,355 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:35112:LearnerHandler@462] - Sending SNAP
2016-03-24 23:01:21,893 [myid:4] - INFO  
[NIOServerCxn.Factory:/192.168.50.24:10540:NIOServerCnxnFactory@207] - Current 
connection (from /192.168.50.22 Cnxns = 4; totalCnxns = 15)
2016-03-24 23:01:22,625 [myid:4] - WARN  
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection 
request from old client /192.168.50.22:49695; will be dropped if server is in 
r-o mode
2016-03-24 23:01:23,283 [myid:4] - INFO  
[QuorumPeer[myid=4]/192.168.50.24:10540:Leader@493] - Shutting down
2016-03-24 23:01:24,102 [myid:4] - INFO  
[QuorumPeer[myid=4]/192.168.50.24:10540:Leader@499] - Shutdown called
2016-03-24 23:01:24,040 [myid:4] - INFO  [SessionTracker:ZooKeeperServer@347] - 
Expiring session 0x453a6dc5b7a007e, timeout of 3500ms exceeded

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread "NIOServerCxn.Factory:/192.168.50.24:10540"
2016-03-24 23:01:25,001 [myid:4] - WARN  
[QuorumPeer[myid=4]/192.168.50.24:10540:QuorumPeer@827] - QuorumPeer main 
thread exited

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread "LearnerHandler-/192.168.50.26:35112"

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler 
in thread "QuorumPeer[myid=4]/192.168.50.24:10540"
2016-03-24 23:01:24,227 [myid:4] - ERROR 
[LearnerHandler-/192.168.50.26:35112:NIOServerCnxnFactory$1@44] - Thread 
LearnerHandler Socket[addr=/192.168.50.26,port=35112,localport=10550] 
tickOfNextAckDeadline:38310 synced?:true queuedPacketLength:7910 died
2016-03-24 23:01:28,492 [myid:4] - INFO  [main:QuorumPeerMain@93] - Exiting 
normally



(2) this leader is very strange, db lastzxid not update, log as follow (after 
grep "Synchronizing with"), we can see max commit zxid from leader is not 
update any more. From the beginning of 20:59:47, leader lastZxid not update.

2016-03-24 20:56:10,266 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:48439:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x90000333f minCommittedLog=0x90000314b 
peerLastZxid=0x90000280a
2016-03-24 20:57:59,203 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:46956:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x900003398 minCommittedLog=0x9000031a4 
peerLastZxid=0x90000280a
2016-03-24 20:59:47,928 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:20601:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:01:26,475 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:29622:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:03:16,552 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:35717:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:03:44,427 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:48197:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:05:01,125 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:57826:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:06:54,187 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:30137:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:07:42,780 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:24255:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:08:41,279 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:40909:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:10:23,137 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:64166:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:11:50,003 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:56070:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:12:11,956 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:41423:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:13:08,286 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:26757:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:13:59,960 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:62785:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:15:41,103 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:53141:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:16:11,125 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:39551:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:17:25,541 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:24638:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:18:25,039 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:54723:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a
2016-03-24 21:19:04,148 [myid:4] - INFO  
[LearnerHandler-/192.168.50.26:37450:LearnerHandler@385] - Synchronizing with 
Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
peerLastZxid=0x90000280a


(3) we can see leader Snapshotting to the same file in different time, but 
receive different TxnZxid(0x900003b70 and 0x9000049dd), this show leader has 
not been updated lastZxid in db.

Snapshotting 1:
2016-03-24 21:33:27,214 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - 
Snapshotting: 0x9000033a1 to 
/opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 21:33:27,333 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - 
Creating new log file: log.900003b70

Snapshotting 2:
2016-03-24 22:41:26,601 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] - 
Snapshotting: 0x9000033a1 to 
/opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
2016-03-24 22:41:26,662 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - 
Creating new log file: log.9000049dd


(4) finally, this node(leader server) zxid is behind zk c client, log as follow:

2016-03-24 23:00:53,712 [myid:4] - WARN  
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection 
request from old client /192.168.50.23:35043; will be dropped if server is in 
r-o mode
2016-03-24 23:00:53,713 [myid:4] - INFO  
[NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@851] - Refusing 
session request for client /192.168.50.23:35043 as it has seen zxid 0x900004e1f 
our last zxid is 0x9000033a1 client must try another server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to