Thanks for your feedback Jason and excuse the delayed response.

See comments below.

On Tue, Jan 9, 2024 at 5:08 PM Jason Gustafson
<ja...@confluent.io.invalid> wrote:
>
> Hi Jose,
>
> Thanks for the KIP! A few initial questions below:
>
> 1. In the user experience section, the user is expected to provide supply
> the UUID for each voter. I'm assuming this is the directory.id coming from
> KIP-858. I thought it was generated by the format comand automatically? It
> seems like we will need a way to specify it explicitly.

Yes. It is currently using a random uuid. I was planning to update
that code to instead use the uuid provided in the
--controller-quorum-voters flag. For example, if the node.id in the
--config file matches one of the replica-id in
--controller-quorum-voters, use the specified replica-uuid for the
directory.id of the metadata.log.dir.

> 2. Do we need the AddVoters and RemoveVoters control records? Perhaps the
> VotersRecord is sufficient since changes to the voter set will be rare.

Yes. We can remove them and I'll remove them. For context, I
originally only had AddVoterRecord and RemoveVoterRecord. When
fleshing out the details of the bootstrapping process, I realized that
I needed a control record that would override the voter set and not do
an incremental change. I needed this functionality because there is no
guarantee that the content of the bootstrap checkpoint
(...0000-...0000.checkpoint) matches in all of the voters.

After I added the VotersRecord, my thinking for keeping AddVoterRecord
and RemoveVoterRecord was to make it explicitly that the protocol only
allows changing one voter at a time. I can instead write a comparison
function that KRaft can use whenever it attempts to write or read a
VotersRecord. The comparison function would fail if all of the
possible majorities of the old voter set don't intersect with all of
the possible majority of the new voter set.

What do you think?

> 3. Why does UpdateVoter need to commit after every leader change?

My thinking is that this algorithm is easier to implement.
Technically, the following (fetching) voter only needs to send an
UpdateVoter RPC when the endpoints known by the leader don't match the
latest endpoint for the voter. This is not something that the follower
can know reliably. This is why I prefered to add an idempotent RPC
like UpdateVoter RPC that the follower voter can perform aggressively
against the leader when the voter discovers a leader.

> 4. Should ReplicaUuid in FetchRequest be a tagged field? It seems like a
> lot of overhead for all consumer fetches.

Yes. I'll make it a tagged field. For now it will only be used by
KRaft. In the future, I can see the broker wanting to use this field
to implement JBOD "reassignment". I don't think consumers will ever
use this field.

> 5. I was looking for the correctness conditions when changing the voter
> set. I recalled something about always taking the voter set from the log
> even if it is not yet known to be committed. Do you have those details
> documented?

Yes. That's correct.

The section Reference Explanation / Voter Changes / Adding Voters has:
"If the new leader supports the current kraft.version, it will write a
AddVoterRecord to the log and immediately update its in-memory quorum
state to include this voter as part of the quorum."

The section Public Interface / Log and Snapshot control records /
AddVoterRecord has:
"KRaft replicas will read all of the control records in the snapshot
and the log irrespective of the commit state and HWM. When a replica
encounters an AddVoterRecord it will add the replica ID and UUID to
its voter set."

The section Public Interface / RPCs / AddVoter / Handling has:
"6. Append the AddVoterRecord to the log.
7. The KRaft internal listener will read this record from the log and
add the voter to the voter set.
8. Wait for the AddVoterRecord to commit using the majority of new voter set."

Thanks,
-- 
-José

Reply via email to