[ https://issues.apache.org/jira/browse/ZOOKEEPER-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016228#comment-18016228 ]
Lin Changrui commented on ZOOKEEPER-4837: ----------------------------------------- If the root cause is as reported, is that mean the sequence of all TRUNC is incorrect? Data of each servers after TRUNC are always inconsistent, such as some transcations of session expired truncated but still visible. We have reproduced a issue like this in our cluster. The affected(network issus) follower has log 'Digests are not matching' after trunc. After a while, we reboot all server at same time. The affected server become leader after election, I guess because it has the highest zxid. We can see some ephemeral nodes which has been deleted some time before on the leader, and they are invisible on followers. I can't upload attachment, part of log is like this. {code:java} WARN (CommitProcessor:5,98) RateLogger,87 - Message:Digests are not matching. Value is Zxid. Value:51539607553 DEBUG (CommitProcessor:5,98) DataTree,1809 - Digest in log: 364897877017, actual tree: 352915824196 DEBUG (nioEventLoopGroup-4-3,59) NettyServerCnxnFactory$CnxnChannelHandler,318 - Received ReadEvent.ENABLE ERROR (CommitProcessor:5,98) DataTree,1811 - First digest mismatch on txn: 360288272277504000,0,51539607553,1744568253693,-10 , 9000 , expected digest is 2,364897877017 , actual digest is 352915824196,{code} > Network issue causes ephemeral node unremoved after the session expiration > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-4837 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4837 > Project: ZooKeeper > Issue Type: Bug > Components: quorum, server > Affects Versions: 3.9.2 > Reporter: Dimas Shidqi Parikesit > Priority: Critical > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > In our testing cluster with the latest ZooKeeper version (66202cb), we > observed that sometimes an ephemeral node never gets deleted if there is a > network issue during the PROPOSAL request, even after the session expires. > This bug is essentially related to ZOOKEEPER-2355, but the issue was not > entirely fixed in the previous patch. We also tested on some related open PRs > (e.g., [https://github.com/apache/zookeeper/pull/2152] and > [https://github.com/apache/zookeeper/pull/1925] ), and confirmed the issue > exists after the proposed fix. > > Steps to reproduce this bug: > # Start a cluster with 3 servers, follower A, leader B, follower C > # Open a zk client in server A > # Create an ephemeral node in the client > # Inject network issue in all server that causes SocketTimeoutException > during readPacket if the packet is a PROPOSAL > # Close the client > # Wait until the cluster is stable (the leader will change between B and C > several times) > # Remove the network issue from all server > # Check every server for ephemeral node existence. The ephemeral node will > exist in server A. However, server B and C will not have the ephemeral node > anymore. > > Essentially the bug is caused by loadDatabase loading a snapshot with a > higher Zxid than the truncated log, causing fastForwardFromEdits to fail when > trying to replay the transactions. For example, if one of the follower has a > lastProcessedZxid of 0x200000001 and last snapshot snapshot.200000001, and > the leader sends a TRUNC with a zxid of 100000002, truncateLog will truncate > the follower's log to 100000002. However, loadDatabase will load > snapshot.200000001. So when fastForwardFromEdits happens, it will set the > data tree to 200000001 instead of 100000002. > > We also attached a test case to reproduce this issue. Note that this test > case is still pretty flaky at this moment. > > We proposed to fix this case by loading the database from the last snapshot > that happens before the last truncated-log entry during truncateLog. See our > PR attached. Of course, this may not be the ideal solution and we welcome > suggestions. Some other potential solutions include: > (1) Disable fastForwardDatabase in shutdown > (2) Run setLastProcessedZxid at the end of Learner's syncWithLeader function > if the packet is Leader.DIFF > > Your insights are very much appreciated. We will continue following up this > issue until it is resolved. -- This message was sent by Atlassian Jira (v8.20.10#820010)