Hi Jun,

See my comments below.

On Thu, Mar 14, 2024 at 3:38 PM Jun Rao <j...@confluent.io.invalid> wrote:
> 37. Have you updated the wiki? It seems that LeaderIdAndEpoch and
> NodeEpoint are still two separate structs.

It is updated now. Apologies for the delayed wiki updates but I was
dealing with other issues in the past couple of weeks.

> 45. kafka-storage format --cluster-id <cluster-id> --release-version 3.8
> --standalone --config controller.properties
> It seems that --release-version is optional and the default value of this
> initial metadata.version is statically defined by the binary used to run
> the tool. So, in the common case, it seems that we could just omit
> --release-version.

That's correct. I removed it from the user explanation section and
updated the reference explanation section for the kafka-storage tool.

> 46. The KIP says "The leader will learn the range of supported version from
> the UpdateVoter RPC.
> 46.1 Then "KRaft will use the information in the controller registration,
> broker registration and add voter records to determine if the new version
> is compatible." is no longer accurate.

Good catch. I updated it to "KRaft will use the information in the
broker registration, voters records and UpdateVoters RPC to determine
if the new kraft.version are compatible. The new metadata.version will
be handled by the controller (QuorumController) like it is today."

> 46.2 Do we have a bit of a chicken-and-egg problem? A voter can't send
> UpdateVoter RPC until kraft.version is upgraded to 1. But to upgrade
> kraft.version, the leader needs to receive UpdateVoter RPCs.

Yes we do. It is a little subtle but this is why I relaxed UpdateVoter
so that it is sent irrespective of the kraft.version. The voters that
are following the leader will send a UpdateVoter request as long as
the "ApiKey" is in the ApiVersion response. I updated this sentence in
the UpdateVoter RPC section to make this clear:

"The voter will always send the UpdateVoter RPC whenever it starts and
whenever the leader changes irrespective of the kraft.version"

The KIP already has this wording in the handling section:

"5. Append the updated VotersRecord to the log if the finalized
kraft.version is greater than 0."

> 46.3 If the leader always learns the range of supported kraft.version from
> UpdateVoter RPC, does the VotersRecord need to include KRaftVersionFeature?

Yes. Storing it in VotersRecord allows the feature (kraft.version) to
be upgraded while K voters are offline where K is the number of
failures tolerated by the voter set. Upgrading from version
kraft.version 0 to kraft.version 1 is special. To upgrade from version
0 to version 1 all of the voters needed to be online at some point
while the current leader has been leader.

> 47. When a voter is started, in what order does it send the UpdateVoter and
> ControllerRegistration RPC?

They are independent of one another and can be sent in any order. Why
do you ask?

> 48. Voters set: "If the partition doesn't contain a VotersRecord control
> record then the value stored in the controller.quorum.voters property will
> be used."
>   Hmm, I thought controller.quorum.voters is only the fallback
> for controller.quorum.bootstrap.servers?

My observation is that the current controller.quorum.voters
configuration plays two roles. 1) It is the voters set for "voter
replicas". 2) It is the bootstrapping endpoint for "observer
replicas".

For 1) the resolution is going to be VotersRecord first,
controller.quorum.voters second.
For 2) the resolution is going to be
controller.quorum.bootstrap.servers first, controller.quorum.voters
second.

This is mainly there for backwards compatibility with configurations
that are valid in 3.7 and before.

> 49. "If the local replica is getting added to the voters set, it will allow
> the transition to prospective candidate when the fetch timer expires."
>   Could you explain why this is needed?

This is just saying that when a replica sees a VotersRecord which
includes itself it means that this local replica has become a voter.
Only voters are allowed to transition to prospective with KIP-996
(pre-vote) or transition to candidate (without KIP-996).

> 50. Quorum state: AppliedOffset will get removed in version 1.
>   This is the original description of AppliedOffset: Reflects the maximum
> offset that has been applied to this quorum state. This is used for log
> recovery. The broker must scan from this point on initialization to detect
> updates to this file. If we remove this field, how do we reason about the
> consistency between the quorum state and the metadata log?

With this KIP, these are the only fields that are present in the
lastest quorum data:
leader id, leader epoch, voted id and voted uuid. None of them are
related to the voters set that is stored in the log and snapshot.

For example, it is correct for the leader id to be in the voters set
and it is also correct for the leader id to not be in the voter set.
This second case can happen when the leader is removing itself from
the voters set.

For example, voter id and voted uuid may or may not be in the voters
set. See this wording in the KIP: "Since the set of voters can change
and not all replicas know the latest voters set, handling of Vote
requests needs to be relaxed from what was defined and implemented for
KIP-595. KRaft replicas will accept Vote requests from all replicas.
Candidate replicas don't need to be in the voters' voters set to
receive a vote. This is needed to be able to elect a leader from the
new voters set even though the new voters set hasn't been replicated
to all of its voters."

> 51. AddVoter has the following steps.
> 1. Wait for the fetch offset of the replica (ID, UUID) to catch up to the
> log end offset of the leader.
> 2. Wait until there are no uncommitted VotersRecord. Note that the
> implementation may just return a REQUEST_TIMED_OUT error if there are
> pending operations.
> 3. Wait for the LeaderChangeMessage control record from the current epoch
> to get committed. Note that the implementation may just return a
> REQUEST_TIMED_OUT error if there are pending operations.
> 4. Send an ApiVersions RPC to the first listener to discover the supported
> kraft.version of the new voter.
> It seems that doing the check on supported kraft.version of the new voter
> in step 4 is too late. If the new voter doesn't support kraft.version of 1,
> it can't process the metadata log records and step 1 could fail.

That's true. Good catch. I think the best change is to move the check
in 1. after 4. I updated the KIP.

> 52. Admin client: Could you provide a bit more details on the changes?

Yes. Let me update the KIP. I'll send another email when I have made
these changes.

> 53. A few more typos.
> ...

Thanks for the suggestions. Updated the KIP accordingly.

-- 
-José

Reply via email to