hanm commented on pull request #1690: URL: https://github.com/apache/zookeeper/pull/1690#issuecomment-847186137
> @hanm Maybe it would be inappropriate for me to ask the following questions under this PR. Now I am learning ZAB 1.0 and prepare to write spec. Could you please let me ask about details in it? > > Currently I have several problems about ZAB 1.0. > > 1. For some detailed industial implementation (like message with type PING, connection between leader and followers), do I need to implement these in spec? It depends on if the particular message type is critical to reveal important details of the protocol. Messages like NEWLEADER or UPTODATE is an integral part of the protocol, I don't think we can abstract it away. For Ping though, I think we can omit it for now. > 2. I see before sending NEWLEADER, leader will choose a best message from SNAP,TRUNC,DIFF to send according to the corresponding follower's state. Do I need to implement these, or abstract this part and just send NEWLEADER to sync with followers? I am inclined to abstract this part away using "RECOVERY_SYNC" operation. In fact, that's what the original pre-1.0 did where a leader always sync learners with full history. We can go more fine grained later. This will hopefully reduce the state space. > 3. I don't understand whether variable 'loracle' is more like a global variable that has the latest leader ID and every server can reach it, or 'loracle' is a local variable like 'votedFor' in Raft. (Actually I used the latter in Zab.tla, which corresponds to 'leaderOracle'. And I think it has no effect on correctness no matter which one is used.) ZAB 1.0 and its implementation does not use this "loracle". Instead, a server always starts a leader election and get the latest later ID through the leader election results. If we want an abstraction here, I would tend to think "loracle" is a global variable. > 4. The last problem what I want to ask is that about recovery, because I didn't see notes about this part in paper and the link. > I saw how the leader handles when a new server wants to join them, but I didn't see how a restarted server finds the latest leader. If 'loracle' is the former in question 3, I can understand this part. In Zab.tla, I use the method where the follower broadcasts messages to ask other servers what their local oracle is, and update its oracle when receiving the same oracle and epoch from a quorum(I used this method because I saw servers recover like this in View-stamped Replication). As mentioned earlier, a restarted (or newly joined) server finds latest leader through leader election. > A follower must have received corresponding PROPOSAL when receiving COMMIT, if the follower is the initial server that joins Q. But I think this may not always be true when the follower is one that joins Q midway. So I want to know how follower handles to catch up state when receiving COMMIT corresponding to a transaction that not exists in its local history.(You could see this condition in the image at the bottom of README. When I wrote spec for Zab pre-1.0, I choose to let followers keep re-sending CEPOCH to obtain latest transactions until the conflict described above disappers.) The "follow handles to catch up" part of the protocol is the recovery part (SNAP, DIFF, TRANC). Follower will not start broadcast unless the recovery phase is finished. The invariant here is that once recovery finished, follower should have the latest history of quorum / leader so any COMMIT it receives has a corresponding entry in its history. This is different comparing to Paxos or Raft where there was no dedicated recovery phase. > > I am very sorry if my problems bother you. Thank you for the various feedbacks and suggestions you have provided before! No worries, I am not super active in community these days, but I will answer questions to the issues that I am involved in (with undefined SLA). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
