[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yongcheng Liu updated ZOOKEEPER-2412:
-------------------------------------
    Affects Version/s: 3.4.6

> leader zk out of memory,  and leader db lastZxid is not update when process 
> set data. 
> --------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2412
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2412
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.6
>         Environment: 3 zk, the follower (myid=6) server is injected into the 
> network delay fault for test, so this follower(id=6) data is behind leader, 
> leader need send SNAP to this follower every time.
>            Reporter: Yongcheng Liu
>
> the time begin to have problem is 20:59:47
> (1) out of memory log:
> 2016-03-24 23:01:21,355 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:35112:LearnerHandler@330] - Follower sid: 6 : 
> info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@5cf112b0
> 2016-03-24 23:01:21,355 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:35112:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 23:01:21,355 [myid:4] - WARN  
> [LearnerHandler-/192.168.50.26:35112:LearnerHandler@446] - Unhandled proposal 
> scenario
> 2016-03-24 23:01:21,355 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:35112:LearnerHandler@462] - Sending SNAP
> 2016-03-24 23:01:21,893 [myid:4] - INFO  
> [NIOServerCxn.Factory:/192.168.50.24:10540:NIOServerCnxnFactory@207] - 
> Current connection (from /192.168.50.22 Cnxns = 4; totalCnxns = 15)
> 2016-03-24 23:01:22,625 [myid:4] - WARN  
> [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection 
> request from old client /192.168.50.22:49695; will be dropped if server is in 
> r-o mode
> 2016-03-24 23:01:23,283 [myid:4] - INFO  
> [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@493] - Shutting down
> 2016-03-24 23:01:24,102 [myid:4] - INFO  
> [QuorumPeer[myid=4]/192.168.50.24:10540:Leader@499] - Shutdown called
> 2016-03-24 23:01:24,040 [myid:4] - INFO  [SessionTracker:ZooKeeperServer@347] 
> - Expiring session 0x453a6dc5b7a007e, timeout of 3500ms exceeded
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "NIOServerCxn.Factory:/192.168.50.24:10540"
> 2016-03-24 23:01:25,001 [myid:4] - WARN  
> [QuorumPeer[myid=4]/192.168.50.24:10540:QuorumPeer@827] - QuorumPeer main 
> thread exited
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "LearnerHandler-/192.168.50.26:35112"
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "QuorumPeer[myid=4]/192.168.50.24:10540"
> 2016-03-24 23:01:24,227 [myid:4] - ERROR 
> [LearnerHandler-/192.168.50.26:35112:NIOServerCnxnFactory$1@44] - Thread 
> LearnerHandler Socket[addr=/192.168.50.26,port=35112,localport=10550] 
> tickOfNextAckDeadline:38310 synced?:true queuedPacketLength:7910 died
> 2016-03-24 23:01:28,492 [myid:4] - INFO  [main:QuorumPeerMain@93] - Exiting 
> normally
> (2) this leader is very strange, db lastzxid not update, log as follow (after 
> grep "Synchronizing with"), we can see max commit zxid from leader is not 
> update any more. From the beginning of 20:59:47, leader lastZxid not update.
> 2016-03-24 20:56:10,266 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:48439:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x90000333f minCommittedLog=0x90000314b 
> peerLastZxid=0x90000280a
> 2016-03-24 20:57:59,203 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:46956:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x900003398 minCommittedLog=0x9000031a4 
> peerLastZxid=0x90000280a
> 2016-03-24 20:59:47,928 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:20601:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:01:26,475 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:29622:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:03:16,552 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:35717:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:03:44,427 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:48197:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:05:01,125 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:57826:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:06:54,187 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:30137:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:07:42,780 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:24255:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:08:41,279 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:40909:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:10:23,137 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:64166:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:11:50,003 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:56070:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:12:11,956 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:41423:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:13:08,286 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:26757:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:13:59,960 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:62785:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:15:41,103 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:53141:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:16:11,125 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:39551:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:17:25,541 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:24638:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:18:25,039 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:54723:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> 2016-03-24 21:19:04,148 [myid:4] - INFO  
> [LearnerHandler-/192.168.50.26:37450:LearnerHandler@385] - Synchronizing with 
> Follower sid: 6 maxCommittedLog=0x9000033a1 minCommittedLog=0x9000031ad 
> peerLastZxid=0x90000280a
> (3) we can see leader Snapshotting to the same file in different time, but 
> receive different TxnZxid(0x900003b70 and 0x9000049dd), this show leader has 
> not been updated lastZxid in db.
> Snapshotting 1:
> 2016-03-24 21:33:27,214 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] 
> - Snapshotting: 0x9000033a1 to 
> /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
> 2016-03-24 21:33:27,333 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - 
> Creating new log file: log.900003b70
> Snapshotting 2:
> 2016-03-24 22:41:26,601 [myid:4] - INFO  [Snapshot Thread:FileTxnSnapLog@253] 
> - Snapshotting: 0x9000033a1 to 
> /opt/dsware/agent/zk/data/version-2/snapshot.9000033a1
> 2016-03-24 22:41:26,662 [myid:4] - INFO  [SyncThread:4:FileTxnLog@199] - 
> Creating new log file: log.9000049dd
> (4) finally, this node(leader server) zxid is behind zk c client, log as 
> follow:
> 2016-03-24 23:00:53,712 [myid:4] - WARN  
> [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@832] - Connection 
> request from old client /192.168.50.23:35043; will be dropped if server is in 
> r-o mode
> 2016-03-24 23:00:53,713 [myid:4] - INFO  
> [NIOServerCxn.Factory:/192.168.50.24:10540:ZooKeeperServer@851] - Refusing 
> session request for client /192.168.50.23:35043 as it has seen zxid 
> 0x900004e1f our last zxid is 0x9000033a1 client must try another server



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to