kezhuw opened a new pull request, #2266: URL: https://github.com/apache/zookeeper/pull/2266
There are two variants of `ZooKeeperServer::processTxn`. Those two variants diverge significantly since ZOOKEEPER-3484. `processTxn(Request request)` pops outstanding change from `outstandingChanges` and adds txn to `committedLog` for follower to sync in addition to what `processTxn(TxnHeader hdr, Record txn)` does. The `Learner` uses `processTxn(TxnHeader hdr, Record txn)` to commit txn to memory after ZOOKEEPER-4394, which means it leaves `committedLog` untouched in `SYNCHRONIZATION` phase. This way, a stale follower will have hole in its `committedLog` after joining cluster. The stale follower will propagate the in memory hole to other stale nodes after becoming leader. This causes data loss. The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 is affected. This commit drops `processTxn(TxnHeader hdr, Record txn)` as `processTxn(Request request)` is capable in `SYNCHRONIZATION` phase too. Also, this commit rejects discontinuous proposals in `syncWithLeader` and `committedLog`, so to avoid possible data loss. Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484 Reviewers: li4wang Author: kezhuw Closes #2254 from kezhuw/ZOOKEEPER-4925-fix-data-loss (cherry picked from commit e5dd60bf0512ccc1e090d99410a8da48623219da) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org