Hi Justine Thanks for the questions! *a. For my understanding, will we block replication? Or just the high watermark advancement?* - The replication will not be blocked. The followers are free to replicate messages above the HWM. Only HWM advancement is blocked.
b. *Also in the acks=1 case, if folks want to continue the previous behavior, they also need to set min.insync.replicas to 1, correct?* - If the clients only send ack=1 messages and minISR=2. The HWM behavior will only be different when there is 1 replica in the ISR. In this case, the min ISR does not do much in the current system. It is kind of a trade-off but we think it is ok. c. *The KIP seems to suggest that we remove from ELR when we start up again and notice we do not have the clean shutdown file. Is there a chance we have an offline broker in ELR that had an unclean shutdown that we elect as a leader before we get the change to realize the shutdown was unclean?* * - *The controller will only elect an unfenced(online) replica as the leader. If a broker has an unclean shutdown, it should register to the controller first(where it has to declair whether it is a clean/unclean shutdown) and then start to serve broker requests. So 1. If the broker has an unclean shutdown before the controller is aware that the replica is offline, then the broker can become the leader temporarily. But it can't serve any Fetch requests before it registers again, and that's when the controller will re-elect a leader. 2. If the controller knows the replica is offline(missing heartbeats from the broker for a while) before the broker re-registers, the broker can't be elected as a leader. d. *Would this be the case for strictly a smaller min ISR?* - Yes, only when we have a smaller min ISR. Once the leader is aware of the minISR change, the HWM can advance and make the current ELR obsolete. So the controller should clear the ELR if the ISR >= the new min ISR. e. *I thought we said the above "Last Leader” behavior can’t be maintained with an empty ISR and it should be removed."* - As the Kip is a big one, we have to consider delivering it in phases. If only the Unclean Recovery is delivered, we do not touch the ISR then the ISR behavior will be the same as the current. I am open to the proposal that directly starting unclean recovery if the last leader fails. Let's see if other folks hope to have more if Unclean Recover delivers first. On Tue, Aug 29, 2023 at 4:53 PM Justine Olshan <jols...@confluent.io.invalid> wrote: > Hey Calvin, > > Thanks for the KIP. This will close some of the gaps in leader election! I > has a few questions: > > *>* *High Watermark can only advance if the ISR size is larger or equal > to min.insync.replicas*. > > For my understanding, will we block replication? Or just the high watermark > advancement? > Also in the acks=1 case, if folks want to continue the previous behavior, > they also need to set min.insync.replicas to 1, correct? It seems like this > change takes some control away from clients when it comes to durability vs > availability. > > *> * > *ELR + ISR size will not be dropped below the min ISR unless the controller > discovers an ELR member has an unclean shutdown. * > The KIP seems to suggest that we remove from ELR when we start up again and > notice we do not have the clean shutdown file. Is there a chance we have an > offline broker in ELR that had an unclean shutdown that we elect as a > leader before we get the change to realize the shutdown was unclean? > This seems like it could cause some problems. I may have missed how we > avoid this scenario though. > > *> When updating the config **min.insync.replicas, * > *if the new min ISR <= current ISR, the ELR will be removed.*Would this be > the case for strictly a smaller min ISR? I suppose if we increase the ISR, > we can't reason about ELR. Can we reason about high water mark in this > case--seems like we will have the broker out of ISR not in ISR or ELR? > (Forgive me if we can't increase min ISR if the increase will put us under > it) > > *> Unclean recovery. * > > - *The unclean leader election will be replaced by the unclean > recovery.* > - *unclean.leader.election.enable will only be replaced by > the unclean.recovery.strategy after ELR is delivered.* > - *As there is no change to the ISR, the "last known leader" behavior is > maintained.* > > What does "last known leader behavior maintained" mean here? I thought we > said *"*The above “*Last Leader” behavior can’t be maintained with an empty > ISR and it should be removed." *My understanding is once metadata version > is updated we will always take the more thoughtful unclean election process > (ie, inspect the logs) > > Overall though, the general KIP is pretty solid. Looking at the rejected > alternatives, it looks like a lot was considered, so it's nice to see the > final proposal. > > Justine > > On Mon, Aug 14, 2023 at 8:50 AM Calvin Liu <ca...@confluent.io.invalid> > wrote: > > > 1. Yes, the new protocol requires 2 things to advance the HWM. a) The > > messages have been replicated to the controller-committed ISR members. > > b) > > The number of ISR members should be at least the min ISR. > > 2. With the current protocol, we are not able to select broker 1 as > the > > leader. If we first imply we have the new HWM requirement in place, > then > > broker 1 is a good candidate to choose. The following part of the KIP > > (ELR) > > part will explain a new mechanism to enable us to choose broker 1. > > Note, if > > both HWM and ELR are in place, broker 1 will be actually elected in > T3. > > > > > > On Fri, Aug 11, 2023 at 10:05 AM Jeff Kim <jeff....@confluent.io.invalid > > > > wrote: > > > > > Hi Calvin, > > > > > > Thanks for the KIP! I'm still digesting it but I have two questions: > > > > > > > In the scenario raised in the motivation section, the server may > > receive > > > ack=1 messages during T1 and advance High Watermark when the leader > > > is the only one in ISR. > > > > > > To confirm, the current protocol allows advancing the HWM if all > brokers > > in > > > the ISR append to their logs (in this case only the leader). And we're > > > proposing > > > to advance the HWM only when <at least min.insync.replicas> brokers > > > replicate. Is this correct? > > > > > > > Then, if we elect broker 1 as the leader at T4, though we can > guarantee > > > the safety of ack=all messages, the High Watermark may move backward > > > which causes further impacts on the consumers. > > > > > > How can broker 1 become the leader if it was ineligible in T3? Or are > > > you referring to broker 2? > > > > > > Thanks, > > > Jeff > > > > > > On Thu, Aug 10, 2023 at 6:48 PM Calvin Liu <ca...@confluent.io.invalid > > > > > wrote: > > > > > > > Hi everyone, > > > > I'd like to discuss a series of enhancement to the replication > > protocol. > > > > > > > > A partition replica can experience local data loss in unclean > shutdown > > > > scenarios where unflushed data in the OS page cache is lost - such as > > an > > > > availability zone power outage or a server error. The Kafka > replication > > > > protocol is designed to handle these situations by removing such > > replicas > > > > from the ISR and only re-adding them once they have caught up and > > > therefore > > > > recovered any lost data. This prevents replicas that lost an > arbitrary > > > log > > > > suffix, which included committed data, from being elected leader. > > > > However, there is a "last replica standing" state which when combined > > > with > > > > a data loss unclean shutdown event can turn a local data loss > scenario > > > into > > > > a global data loss scenario, i.e., committed data can be removed from > > all > > > > replicas. When the last replica in the ISR experiences an unclean > > > shutdown > > > > and loses committed data, it will be reelected leader after starting > up > > > > again, causing rejoining followers to truncate their logs and thereby > > > > removing the last copies of the committed records which the leader > lost > > > > initially. > > > > > > > > The new KIP will maximize the protection and provides MinISR-1 > > tolerance > > > to > > > > data loss unclean shutdown events. > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-966%3A+Eligible+Leader+Replicas > > > > > > > > > >