Hi, Luke. Though this proposal definitely looks interesting, as others pointed out, the leader election implementation would be the hard part.
And I think even LEO-based-election is not safe, which could cause silent committed-data loss easily. Let's say we have replicas A,B,C and A is the leader initially, and min.insync.replicas = 2. - 1. Initial * A(leo=0), B(leo=0), C(leo=0) - 2. Produce a message to A * A(leo=1), B(leo=0), C(leo=0) - 3. Another producer produces a message to A (i.e. as the different batch) * A(leo=2), B(leo=0), C(leo=0) - 4. C replicates the first batch. offset=1 is committed (by acks=min.insync.replicas) * A(leo=2), B(leo=0), C(leo=1) - 5. A loses ZK session (or broker session timeout in KRaft) - 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so it needs to interact with each replica. It detects C has the largest LEO and decided to elect C as the new leader - 7. Before leader-election is performed, B replicates offset=1,2 from A. offset=2 is committed * This is possible because even if A lost ZK session, A could handle fetch requests for a while. - 8. Controller elects C as the new leader. B truncates its offset. offset=2 is lost silently. I have a feeling that we need quorum-based data replication? as Divij pointed out. 2023年5月11日(木) 22:33 David Jacot <dja...@confluent.io.invalid>: > Hi Luke, > > > Yes, on second thought, I think the new leader election is required to > work > for this new acks option. I'll think about it and open another KIP for it. > > It can't be in another KIP as it is required for your proposal to work. > This is also an important part to discuss as it requires the controller to > do more operations on leader changes. > > Cheers, > David > > On Thu, May 11, 2023 at 2:44 PM Luke Chen <show...@gmail.com> wrote: > > > Hi Ismael, > > Yes, on second thought, I think the new leader election is required to > work > > for this new acks option. I'll think about it and open another KIP for > it. > > > > Hi Divij, > > Yes, I agree with all of them. I'll think about it and let you know how > we > > can work together. > > > > Hi Alexandre, > > > 100. The KIP makes one statement which may be considered critical: > > "Note that in acks=min.insync.replicas case, the slow follower might > > be easier to become out of sync than acks=all.". Would you have some > > data on that behaviour when using the new ack semantic? It would be > > interesting to analyse and especially look at the percentage of time > > the number of replicas in ISR is reduced to the configured > > min.insync.replicas. > > > > The comparison data would be interesting. I can have a test when > available. > > But this KIP will be deprioritized because there should be a > pre-requisite > > KIP for it. > > > > > A (perhaps naive) hypothesis would be that the > > new ack semantic indeed provides better produce latency, but at the > > cost of precipitating the slowest replica(s) out of the ISR? > > > > Yes, it could be. > > > > > 101. I understand the impact on produce latency, but I am not sure > > about the impact on durability. Is your durability model built against > > the replication factor or the number of min insync replicas? > > > > Yes, and also the new LEO-based leader election (not proposed yet). > > > > > 102. Could a new type of replica which would not be allowed to enter > > the ISR be an alternative? Such replica could attempt replication on a > > best-effort basis and would provide the permanent guarantee not to > > interfere with foreground traffic. > > > > You mean a backup replica, which will never become leader (in-sync), > right? > > That's an interesting thought and might be able to become a workaround > with > > the existing leader election. Let me think about it. > > > > Hi qiangLiu, > > > > > It's a good point that add this config and get better P99 latency, but > is > > this changing the meaning of "in sync replicas"? consider a situation > with > > "replica=3 acks=2", when two broker fail and left only the broker that > > does't have the message, it is in sync, so will be elected as leader, > will > > it cause a NOT NOTICED lost of acked messages? > > > > Yes, it will, so the `min.insync.replicas` config in the broker/topic > level > > should be set correctly. In your example, it should be set to 2, so that > > when 2 replicas down, no new message write will be successful. > > > > > > Thank you. > > Luke > > > > > > On Thu, May 11, 2023 at 12:16 PM 67 <6...@gd67.com> wrote: > > > > > Hi Luke, > > > > > > > > > It's a good point that add this config and get better P99 latency, but > is > > > this changing the meaning of "in sync replicas"? consider a situation > > with > > > "replica=3 acks=2", when two broker fail and left only the broker that > > > does't have the message, it is in sync, so will be elected as leader, > > will > > > it cause a *NOT NOTICED* lost of acked messages? > > > > > > qiangLiu > > > > > > > > > 在2023年05月10 12时58分,"Ismael Juma"<ism...@juma.me.uk>写道: > > > > > > > > > Hi Luke, > > > > > > As discussed in the other KIP, there are some subtleties when it comes > to > > > the semantics of the system if we don't wait for all members of the isr > > > before we ack. I don't understand why you say the leader election > > question > > > is out of scope - it seems to be a core aspect to me. > > > > > > Ismael > > > > > > > > > On Wed, May 10, 2023, 8:50 AM Luke Chen <show...@gmail.com> wrote: > > > > > > > Hi Ismael, > > > > > > > > No, I didn't know about this similar KIP! I hope I've known that so > > that I > > > > don't need to spend time to write it again! :( > > > > I checked the KIP and all the discussions (here > > > > <https://lists.apache.org/list?dev@kafka.apache.org:gte=100d:KIP-250 > >). > > I > > > > think the consensus is that adding a client config to `acks=quorum` > is > > > > fine. > > > > This comment > > > > <https://lists.apache.org/thread/p77pym5sxpn91r8j364kmmf3qp5g65rn> > > from > > > > Guozhang pretty much concluded what I'm trying to do. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > *1. Add one more value to client-side acks config: 0: no acks > needed > > at > > > > all. 1: ack from the leader. all: ack from ALL the ISR replicas > > > > quorum: this is the new value, it requires ack from enough number of > > ISR > > > > replicas no smaller than majority of the replicas AND no smaller > > > > than{min.isr}.2. Clarify in the docs that if a user wants to > tolerate X > > > > failures, she needs to set client acks=all or acks=quorum (better > tail > > > > latency than "all") with broker {min.sir} to be X+1; however, "all" > is > > not > > > > necessarily stronger than "quorum".* > > > > > > > > Concerns from KIP-250 are: > > > > 1. Introducing a new leader LEO based election method. This is not > > clear in > > > > the KIP-250 and needs more discussion > > > > 2. The KIP-250 also tried to optimize the consumer latency to read > > messages > > > > beyond high watermark, which also has some discussion about how to > > achieve > > > > that, and no conclusion > > > > > > > > Both of the above 2 concerns are out of the scope of my current KIP. > > > > So, I think it's good to provide this `acks=quorum` or > > > > `acks=min.insync.replicas` option to users to give them another > choice. > > > > > > > > > > > > Thank you. > > > > Luke > > > > > > > > > > > > On Wed, May 10, 2023 at 8:54 AM Ismael Juma <ism...@juma.me.uk> > wrote: > > > > > > > > > Hi Luke, > > > > > > > > > > Are you aware of > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-250+Add+Support+for+Quorum-based+Producer+Acknowledgment > > > > > ? > > > > > > > > > > Ismael > > > > > > > > > > On Tue, May 9, 2023 at 10:14 PM Luke Chen <show...@gmail.com> > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I'd like to start a discussion for the KIP-926: introducing > > > > > > acks=min.insync.replicas config. This KIP is to introduce > > > > > > `acks=min.insync.replicas` config value in producer, to improve > the > > > > write > > > > > > throughput and still guarantee high durability. > > > > > > > > > > > > Please check the link for more detail: > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-926%3A+introducing+acks%3Dmin.insync.replicas+config > > > > > > > > > > > > Any feedback is welcome. > > > > > > > > > > > > Thank you. > > > > > > Luke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- ======================== Okada Haruki ocadar...@gmail.com ========================