kezhuw commented on PR #1925:
URL: https://github.com/apache/zookeeper/pull/1925#issuecomment-1535372739

   > Hmm, that other PR doesn't look right to me. It fails to store the TXNs 
that aren't already committed before ACKing the NEWLEADER, which was what 
ZOOKEEPER-3911 was all about, in the first place. Agree?
   
   > // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit them 
(ACK NEWLEADER).
   
   Given that, I would say #1445 itself is misleading or we are 
misunderstanding the jira part of [ZOOKEEPER-3911][], or both. It should sync 
only txns in `DIFF` sync, all proposals in this phase are considered committed 
by new leader. I think [ZOOKEEPER-4394][] already/almost listed the point: 
`NEWLEADER` is not appended intermediately/atomically after proposals in `DIFF` 
sync. I guess we are not aware of this by the time of #1445.
   
   These ongoing(committed or not) proposals are simply beyond of discussion, 
these belongs to `broadcast` phase(See [Zab][], [Zab1.0][]). I considered them 
as gap between paper and implementation. Steps in paper are mostly like atomic, 
but implementations are not.
   
   Back to this pr, the good part is that it is verified in production. The bad 
part is that it is giant and mixed several issues and areas.
   
   * [ZOOKEEPER-4409][] `NullPointerException` in `SendAckRequestProcessor`
   * [ZOOKEEPER-4502][] `SyncRequestProcessor` leaks in 
`ZooKeeperServer::shutdown`.
   * Drop [preZab1.0][]. This might make this pr not cherry-pick-able, as it 
could break clients to 3.7.x and 3.8.x. Though, I think it was probably broken 
by [ZOOKEEPER-3104][] long before.
   * `DIFF` syncs are not persisted. There are several issues and prs.
     * [ZOOKEEPER-3023][]: Flaky 
test`Zab1_0Test.testNormalFollowerRunWithDiff`. This predates 
[ZOOKEEPER-3911][], basically we can consider #1848 as fix to 
[ZOOKEEPER-2678][].
     * [ZOOKEEPER-3911][]: Data inconsistency caused by DIFF sync uncommitted 
log. #1445 tried, but failed, so we are here.
     * [ZOOKEEPER-4394][]: `Learner.syncWithLeader` got `NullPointerException`. 
I think this is the symptom of #1445 and lagging `NEWLEADER`. Targeting a fix 
to avoid `NullPointerException` is going wrong direction from my point of view. 
#1930 will fix the `NullPointerException` but no more than that.
     * [ZOOKEEPER-4646][](Committed txns may still be lost if followers crash 
after replying ACK of NEWLEADER but before writing txns to disk), 
[ZOOKEEPER-4685][](Unnecessary system unavailability due to Leader shutdown 
when follower sent ACK of PROPOSAL before sending ACK of NEWLEADER in log 
recovery). I was impressed by the analysis. #1993 is the ongoing effort. But i 
think it was misled by #1445, hence does not fix [ZOOKEEPER-4394][].
     * This pr. To be honest, I don't like this part of this pr. It tied up 
`synchronization` and `broadcast` phase. Personally, I prefer a synchronous 
approach as #1848 and #1993 adopted, I think they are clean and 
straightforward. Though, they have cons/drawbacks as #1993 pointed out.
   
     In long term, I would suggest to send `NEWLEADER` just after `DIFF` sync, 
this aligns to the paper. But it won't fix this persistent issue.
   
   * Refactor. I like this part, but it may increase review burden.
   
   Personally, I would suggest to fix above issues separately. There are other 
possible issues in `synchronization` phase.
   * [ZOOKEEPER-4643][]: Committed txns may be improperly truncated if follower 
crashes right after updating currentEpoch but before persisting txns to disk
   * `NEWLEADER` is not sent intermediately after `DIFF` syncs. This diverges 
from the paper. [ZOOKEEPER-4394][] pointed out this.
   * It is not guaranteed/asserted in `LearnerHandler` that the first `ACK` is 
targeting `NEWLEADER`.  [ZOOKEEPER-4685][] pointed out this.
   * Unmatched commits are causing `log.warn`s in `synchronization` phase, 
while they cause system exit in `FollowerZooKeeperServer.commit`.
   
   > IMO, this is pretty high priority bug to fix because it requires manual 
intervention to recover from. Otherwise, the cluster is in a permanently 
inconsistent state.
   
   > I think it would be good to conclude this work soon. It's getting a bit 
stale :)
   
   Hmm, some of my prs #1820(merged), #1859(approved) and #1847 are all almost 
one year from its creation. I also opened apache/bookkeeper#3041(data loss, 1 
year), apache/pulsar#7490(data duplication, 3 years) and 
google/leveldb#375(data inconsistency, 3 years until google/leveldb#339 got 
merged). I guess we should believe in time 😮‍💨.
   
   > Oof, would probably have been a good idea, but one thing led to another, 
and now it'd be a lot of work to split this 😬 🙂
   
   Maybe we can start from fresh fixes ? Anyway, it may not be a pleasure 
process 😨 😵‍💫.  I believed that getting pr concentrated is helpful to get 
merged. People might fear of giant pr, at least, I was hesitated (part of this) 
to get involved till today.
   
   Could you please take a look at this and alternatives #1848 and #1993 ? 
@eolivelli @breed @cnauroth @hanm @nkalmar @ztzg @anmolnar  @tisonkun @li4wang
   
   [ZOOKEEPER-2678]: https://issues.apache.org/jira/browse/ZOOKEEPER-2678
   [ZOOKEEPER-3023]: https://issues.apache.org/jira/browse/ZOOKEEPER-3023
   [ZOOKEEPER-3911]: https://issues.apache.org/jira/browse/ZOOKEEPER-3911
   [ZOOKEEPER-4409]: https://issues.apache.org/jira/browse/ZOOKEEPER-4409
   [ZOOKEEPER-4502]: https://issues.apache.org/jira/browse/ZOOKEEPER-4502
   [ZOOKEEPER-4394]: https://issues.apache.org/jira/browse/ZOOKEEPER-4394
   [ZOOKEEPER-3104]: https://issues.apache.org/jira/browse/ZOOKEEPER-3104
   [ZOOKEEPER-3642]: https://issues.apache.org/jira/browse/ZOOKEEPER-3642
   [ZOOKEEPER-4643]: https://issues.apache.org/jira/browse/ZOOKEEPER-4643
   [ZOOKEEPER-4646]: https://issues.apache.org/jira/browse/ZOOKEEPER-4646
   [ZOOKEEPER-4685]: https://issues.apache.org/jira/browse/ZOOKEEPER-4685
   [Zab]: 
https://distributedalgorithm.wordpress.com/2015/06/20/architecture-of-zab-zookeeper-atomic-broadcast-protocol/
   [Zab1.0]: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0
   [preZab1.0]: 
https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+Pre+1.0
   [Heartbleed]: https://en.wikipedia.org/wiki/Heartbleed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to