Github user lvfangmin commented on the issue:
https://github.com/apache/zookeeper/pull/353
@anmolnar thanks for reviewing, the testNoLogBeforeLeaderEstablishment was
introduced by mistake during rebase, and for the confusion. I've fixed the
other test to catch the issue I'm trying to reproduce by removing the
zk.dontReconnect() statement.
Here is the problem I'm trying to address in this diff:
1. client trying to renew session A on server S1
2. S1 is slow (like full GC, or high network delay due to packet lost) on
sending the revalidate request to leader
3. client timed out on renew session A on server S1, and tried to connect
to S2
4. S2 is faster than S1, and it revalidated the session on leader and owns
the session
5. S1's revalidate finally reached leader, and leader updated the owner to
S1
6. from now on, the requests from this client will always get session moved
error, although S2 is the right one which owns the session
The server need to close session in this case to allow the client to
reconnect and address this corner case.
Jira ZOOKEEPER-710 solved the non multi-op cases, but if the client only
sends multi-op it can hit this problem again, which is addressed in this diff.
---