Thanks Jason and David for your feedback. See my comments below.

David wrote:
> 1) Does recovering from an unclean state bump the leader epoch?

Looking at the controller code, the leader epoch is only increased if
the leader id changes.

David wrote:
> 2) The name of "NewIsUnclean" field in AlterIsrRequest is a little strange.
> From the description, it sounds like this will be used to by the broker to
> indicate to the controller that it has recovered from unclean leader
> election. If that's the case, maybe something like "RecoverFromUnclean"
> would be better?

Jason wrote:
> By the way, I do find the naming of the "IsUnclean" field a tad awkward.
> The naming suggests that it reflects upon the election, but then it is
> strange that the election becomes clean through recovery (which obviously
> cannot restore the lost data). An alternative name might be
> "UncleanRecoveryRequired." Another option might be to consider it more of a
> partition state. After an unclean election, then the state might be
> UNCLEAN_ELECTED. After recovery, it might transition to UNCLEAN_RECOVERED.
> Then at least we keep track of the fact that the current leader was
> uncleanly elected. Not sure how important that is, just a thought..

Yes, I agree, naming is hard. How about "IsRecovering" for all of the
messages and ZK state? I am leaning towards "IsRecovering" instead of
"UncleanRecoveryRequired", "RecoverFromUnclean", "IsUnclean" because
this state is read by the leader and the followers.

David wrote:
> 3) Will followers try to fetch from an unclean leader? Or will they wait
> for a LISR with unclean=false (or a PartitionChangeRecord with
> unclean=false)?

Jason wrote:
> "This means that the leader will not allow followers to join the ISR until it 
> has recovered from the unclean leader election."
>
> If I understand correctly, the main reason for this is to avoid the need to
> propagate the "IsUnclean" flag between elections. It ensures that we cannot
> have a "clean" election until the recovery has completed. On the other
> hand, if we need to do another unclean election because the recovering
> leader failed, then we would get the "IsUnclean" flag naturally. Are there
> any additional limitations we should consider while the unclean leader is
> recovering? For example, should we not allow consumers to read from the
> partition until the recovery has completed as well?

I'll update the KIP with this information. The leader will return
"NOT_LEADER_OR_FOLLOWER" for any partition that is still recovering
for Fetch, Produce, OffsetsForLeaderEpoch and DeleteRecords requests.
This error type is retriable by the clients.

For Fetch requests, the replicas will handle this error by backing off
by "replica.fetch.backoff.ms". When the cluster is getting upgraded it
is possible for the partition leader to receive a fetch request form
replicas that do and do not understand the "IsRecovering" field. The
consumers will handle this error by queuing a full metadata request
for the next metadata request interval.

For Produce requests, the producers will requeue the request if it is
within the retry window.

David wrote:
> 4) Is there any other way for a partition to recover from an unclean state
> other than the leader sending AISR with NewIsUnclean=false? Is it possible
> for a leader to fail the recovery process? Are further unclean election
> attempts by a user blocked until we recover?

If there is a bug where the leader is not recovering from uncleaning
leader election and setting the "IsRecovering" correctly, the user is
going to have to downgrade or upgrade the broker to a version of the
software that doesn't have this bug. The user can perform another
unclean leader election as long as there is no leader and there is
another online replica.

Thanks
-Jose

Reply via email to