kezhuw opened a new pull request, #2266:
URL: https://github.com/apache/zookeeper/pull/2266

   There are two variants of `ZooKeeperServer::processTxn`. Those two variants 
diverge significantly since ZOOKEEPER-3484. `processTxn(Request request)` pops 
outstanding change from `outstandingChanges` and adds txn to `committedLog` for 
follower to sync in addition to what `processTxn(TxnHeader hdr, Record txn)` 
does. The `Learner` uses `processTxn(TxnHeader hdr, Record txn)` to commit txn 
to memory after ZOOKEEPER-4394, which means it leaves `committedLog` untouched 
in `SYNCHRONIZATION` phase.
   
   This way, a stale follower will have hole in its `committedLog` after 
joining cluster. The stale follower will propagate the in memory hole to other 
stale nodes after becoming leader. This causes data loss.
   
   The test case fails on master and 3.9.3, and passes on 3.9.2. So only 3.9.3 
is affected.
   
   This commit drops `processTxn(TxnHeader hdr, Record txn)` as 
`processTxn(Request request)` is capable in `SYNCHRONIZATION` phase too.
   
   Also, this commit rejects discontinuous proposals in `syncWithLeader` and 
`committedLog`, so to avoid possible data loss.
   
   Refs: ZOOKEEPER-4925, ZOOKEEPER-4394, ZOOKEEPER-3484
   
   Reviewers: li4wang
   Author: kezhuw
   Closes #2254 from kezhuw/ZOOKEEPER-4925-fix-data-loss
   
   (cherry picked from commit e5dd60bf0512ccc1e090d99410a8da48623219da)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to