Hi devs, ZOOKEEPER-4925[1] reports a data loss bug that could happen in the following condition: 1. A follower which stalls for a while will introduce hole in its `committedLog` after sync with the leader. This was introduced in pr-2152[2] to solve NPE in `syncWithLeader` reported in ZOOKEEPER-4394[3] and shipped in 3.9.3. 2. The hole introduced above could be propagated to other nodes if the above follower becomes leader. We never forbid discontinuous txns in all cases.
I have opened pr-2254[4] to fix this. I would like to solve it before the next release since it could be easily introduced in certain conditions. I have expressed this in the voting thread for 3.9.4-rc0[5]. Look forward to your reviews! Best, Kezhu Wang [1]: https://issues.apache.org/jira/browse/ZOOKEEPER-4925 [2]: https://github.com/apache/zookeeper/pull/2152 [3]: https://issues.apache.org/jira/browse/ZOOKEEPER-4394 [4]: https://github.com/apache/zookeeper/pull/2254 [5]: https://lists.apache.org/thread/sq5djdm5ttbscbtdw5ykp5vl4dfb8p56