kezhuw commented on PR #1925: URL: https://github.com/apache/zookeeper/pull/1925#issuecomment-1535372739
> Hmm, that other PR doesn't look right to me. It fails to store the TXNs that aren't already committed before ACKing the NEWLEADER, which was what ZOOKEEPER-3911 was all about, in the first place. Agree? > // ZOOKEEPER-3911: make sure sync the uncommitted logs before commit them (ACK NEWLEADER). Given that, I would say #1445 itself is misleading or we are misunderstanding the jira part of [ZOOKEEPER-3911][], or both. It should sync only txns in `DIFF` sync, all proposals in this phase are considered committed by new leader. I think [ZOOKEEPER-4394][] already/almost listed the point: `NEWLEADER` is not appended intermediately/atomically after proposals in `DIFF` sync. I guess we are not aware of this by the time of #1445. These ongoing(committed or not) proposals are simply beyond of discussion, these belongs to `broadcast` phase(See [Zab][], [Zab1.0][]). I considered them as gap between paper and implementation. Steps in paper are mostly like atomic, but implementations are not. Back to this pr, the good part is that it is verified in production. The bad part is that it is giant and mixed several issues and areas. * [ZOOKEEPER-4409][] `NullPointerException` in `SendAckRequestProcessor` * [ZOOKEEPER-4502][] `SyncRequestProcessor` leaks in `ZooKeeperServer::shutdown`. * Drop [preZab1.0][]. This might make this pr not cherry-pick-able, as it could break clients to 3.7.x and 3.8.x. Though, I think it was probably broken by [ZOOKEEPER-3104][] long before. * `DIFF` syncs are not persisted. There are several issues and prs. * [ZOOKEEPER-3023][]: Flaky test`Zab1_0Test.testNormalFollowerRunWithDiff`. This predates [ZOOKEEPER-3911][], basically we can consider #1848 as fix to [ZOOKEEPER-2678][]. * [ZOOKEEPER-3911][]: Data inconsistency caused by DIFF sync uncommitted log. #1445 tried, but failed, so we are here. * [ZOOKEEPER-4394][]: `Learner.syncWithLeader` got `NullPointerException`. I think this is the symptom of #1445 and lagging `NEWLEADER`. Targeting a fix to avoid `NullPointerException` is going wrong direction from my point of view. #1930 will fix the `NullPointerException` but no more than that. * [ZOOKEEPER-4646][](Committed txns may still be lost if followers crash after replying ACK of NEWLEADER but before writing txns to disk), [ZOOKEEPER-4685][](Unnecessary system unavailability due to Leader shutdown when follower sent ACK of PROPOSAL before sending ACK of NEWLEADER in log recovery). I was impressed by the analysis. #1993 is the ongoing effort. But i think it was misled by #1445, hence does not fix [ZOOKEEPER-4394][]. * This pr. To be honest, I don't like this part of this pr. It tied up `synchronization` and `broadcast` phase. Personally, I prefer a synchronous approach as #1848 and #1993 adopted, I think they are clean and straightforward. Though, they have cons/drawbacks as #1993 pointed out. In long term, I would suggest to send `NEWLEADER` just after `DIFF` sync, this aligns to the paper. But it won't fix this persistent issue. * Refactor. I like this part, but it may increase review burden. Personally, I would suggest to fix above issues separately. There are other possible issues in `synchronization` phase. * [ZOOKEEPER-4643][]: Committed txns may be improperly truncated if follower crashes right after updating currentEpoch but before persisting txns to disk * `NEWLEADER` is not sent intermediately after `DIFF` syncs. This diverges from the paper. [ZOOKEEPER-4394][] pointed out this. * It is not guaranteed/asserted in `LearnerHandler` that the first `ACK` is targeting `NEWLEADER`. [ZOOKEEPER-4685][] pointed out this. * Unmatched commits are causing `log.warn`s in `synchronization` phase, while they cause system exit in `FollowerZooKeeperServer.commit`. > IMO, this is pretty high priority bug to fix because it requires manual intervention to recover from. Otherwise, the cluster is in a permanently inconsistent state. > I think it would be good to conclude this work soon. It's getting a bit stale :) Hmm, some of my prs #1820(merged), #1859(approved) and #1847 are all almost one year from its creation. I also opened apache/bookkeeper#3041(data loss, 1 year), apache/pulsar#7490(data duplication, 3 years) and google/leveldb#375(data inconsistency, 3 years until google/leveldb#339 got merged). I guess we should believe in time 😮💨. > Oof, would probably have been a good idea, but one thing led to another, and now it'd be a lot of work to split this 😬 🙂 Maybe we can start from fresh fixes ? Anyway, it may not be a pleasure process 😨 😵💫. I believed that getting pr concentrated is helpful to get merged. People might fear of giant pr, at least, I was hesitated (part of this) to get involved till today. Could you please take a look at this and alternatives #1848 and #1993 ? @eolivelli @breed @cnauroth @hanm @nkalmar @ztzg @anmolnar @tisonkun @li4wang [ZOOKEEPER-2678]: https://issues.apache.org/jira/browse/ZOOKEEPER-2678 [ZOOKEEPER-3023]: https://issues.apache.org/jira/browse/ZOOKEEPER-3023 [ZOOKEEPER-3911]: https://issues.apache.org/jira/browse/ZOOKEEPER-3911 [ZOOKEEPER-4409]: https://issues.apache.org/jira/browse/ZOOKEEPER-4409 [ZOOKEEPER-4502]: https://issues.apache.org/jira/browse/ZOOKEEPER-4502 [ZOOKEEPER-4394]: https://issues.apache.org/jira/browse/ZOOKEEPER-4394 [ZOOKEEPER-3104]: https://issues.apache.org/jira/browse/ZOOKEEPER-3104 [ZOOKEEPER-3642]: https://issues.apache.org/jira/browse/ZOOKEEPER-3642 [ZOOKEEPER-4643]: https://issues.apache.org/jira/browse/ZOOKEEPER-4643 [ZOOKEEPER-4646]: https://issues.apache.org/jira/browse/ZOOKEEPER-4646 [ZOOKEEPER-4685]: https://issues.apache.org/jira/browse/ZOOKEEPER-4685 [Zab]: https://distributedalgorithm.wordpress.com/2015/06/20/architecture-of-zab-zookeeper-atomic-broadcast-protocol/ [Zab1.0]: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0 [preZab1.0]: https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab+Pre+1.0 [Heartbleed]: https://en.wikipedia.org/wiki/Heartbleed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: notifications-unsubscr...@zookeeper.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org