Hi Jun, See my comments below.
On Thu, Mar 14, 2024 at 3:38 PM Jun Rao <j...@confluent.io.invalid> wrote: > 37. Have you updated the wiki? It seems that LeaderIdAndEpoch and > NodeEpoint are still two separate structs. It is updated now. Apologies for the delayed wiki updates but I was dealing with other issues in the past couple of weeks. > 45. kafka-storage format --cluster-id <cluster-id> --release-version 3.8 > --standalone --config controller.properties > It seems that --release-version is optional and the default value of this > initial metadata.version is statically defined by the binary used to run > the tool. So, in the common case, it seems that we could just omit > --release-version. That's correct. I removed it from the user explanation section and updated the reference explanation section for the kafka-storage tool. > 46. The KIP says "The leader will learn the range of supported version from > the UpdateVoter RPC. > 46.1 Then "KRaft will use the information in the controller registration, > broker registration and add voter records to determine if the new version > is compatible." is no longer accurate. Good catch. I updated it to "KRaft will use the information in the broker registration, voters records and UpdateVoters RPC to determine if the new kraft.version are compatible. The new metadata.version will be handled by the controller (QuorumController) like it is today." > 46.2 Do we have a bit of a chicken-and-egg problem? A voter can't send > UpdateVoter RPC until kraft.version is upgraded to 1. But to upgrade > kraft.version, the leader needs to receive UpdateVoter RPCs. Yes we do. It is a little subtle but this is why I relaxed UpdateVoter so that it is sent irrespective of the kraft.version. The voters that are following the leader will send a UpdateVoter request as long as the "ApiKey" is in the ApiVersion response. I updated this sentence in the UpdateVoter RPC section to make this clear: "The voter will always send the UpdateVoter RPC whenever it starts and whenever the leader changes irrespective of the kraft.version" The KIP already has this wording in the handling section: "5. Append the updated VotersRecord to the log if the finalized kraft.version is greater than 0." > 46.3 If the leader always learns the range of supported kraft.version from > UpdateVoter RPC, does the VotersRecord need to include KRaftVersionFeature? Yes. Storing it in VotersRecord allows the feature (kraft.version) to be upgraded while K voters are offline where K is the number of failures tolerated by the voter set. Upgrading from version kraft.version 0 to kraft.version 1 is special. To upgrade from version 0 to version 1 all of the voters needed to be online at some point while the current leader has been leader. > 47. When a voter is started, in what order does it send the UpdateVoter and > ControllerRegistration RPC? They are independent of one another and can be sent in any order. Why do you ask? > 48. Voters set: "If the partition doesn't contain a VotersRecord control > record then the value stored in the controller.quorum.voters property will > be used." > Hmm, I thought controller.quorum.voters is only the fallback > for controller.quorum.bootstrap.servers? My observation is that the current controller.quorum.voters configuration plays two roles. 1) It is the voters set for "voter replicas". 2) It is the bootstrapping endpoint for "observer replicas". For 1) the resolution is going to be VotersRecord first, controller.quorum.voters second. For 2) the resolution is going to be controller.quorum.bootstrap.servers first, controller.quorum.voters second. This is mainly there for backwards compatibility with configurations that are valid in 3.7 and before. > 49. "If the local replica is getting added to the voters set, it will allow > the transition to prospective candidate when the fetch timer expires." > Could you explain why this is needed? This is just saying that when a replica sees a VotersRecord which includes itself it means that this local replica has become a voter. Only voters are allowed to transition to prospective with KIP-996 (pre-vote) or transition to candidate (without KIP-996). > 50. Quorum state: AppliedOffset will get removed in version 1. > This is the original description of AppliedOffset: Reflects the maximum > offset that has been applied to this quorum state. This is used for log > recovery. The broker must scan from this point on initialization to detect > updates to this file. If we remove this field, how do we reason about the > consistency between the quorum state and the metadata log? With this KIP, these are the only fields that are present in the lastest quorum data: leader id, leader epoch, voted id and voted uuid. None of them are related to the voters set that is stored in the log and snapshot. For example, it is correct for the leader id to be in the voters set and it is also correct for the leader id to not be in the voter set. This second case can happen when the leader is removing itself from the voters set. For example, voter id and voted uuid may or may not be in the voters set. See this wording in the KIP: "Since the set of voters can change and not all replicas know the latest voters set, handling of Vote requests needs to be relaxed from what was defined and implemented for KIP-595. KRaft replicas will accept Vote requests from all replicas. Candidate replicas don't need to be in the voters' voters set to receive a vote. This is needed to be able to elect a leader from the new voters set even though the new voters set hasn't been replicated to all of its voters." > 51. AddVoter has the following steps. > 1. Wait for the fetch offset of the replica (ID, UUID) to catch up to the > log end offset of the leader. > 2. Wait until there are no uncommitted VotersRecord. Note that the > implementation may just return a REQUEST_TIMED_OUT error if there are > pending operations. > 3. Wait for the LeaderChangeMessage control record from the current epoch > to get committed. Note that the implementation may just return a > REQUEST_TIMED_OUT error if there are pending operations. > 4. Send an ApiVersions RPC to the first listener to discover the supported > kraft.version of the new voter. > It seems that doing the check on supported kraft.version of the new voter > in step 4 is too late. If the new voter doesn't support kraft.version of 1, > it can't process the metadata log records and step 1 could fail. That's true. Good catch. I think the best change is to move the check in 1. after 4. I updated the KIP. > 52. Admin client: Could you provide a bit more details on the changes? Yes. Let me update the KIP. I'll send another email when I have made these changes. > 53. A few more typos. > ... Thanks for the suggestions. Updated the KIP accordingly. -- -José