Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

David Jacot Fri, 12 May 2023 02:06:11 -0700

That's fair so I leave it up to you, Luke.

Cheers,
David


On Fri, May 12, 2023 at 10:58 AM Luke Chen <[email protected]> wrote:

> Hi David,
>
> Thanks for the response.
> But I don't think the LEO-based leader election only benefit for this case.
> Like in unclean clear election case, we now randomly chose a out-of-sync
> replica to become the leader.
> This LEO-based leader election will help this case, too.
> Besides, not all producers use `acks=all`, thus, when using `acks=0` or
> `acks=1`, they can also benefit from LEO-based leader election.
>
> That's why I think this could be in a separate KIP, and after that's
> introduced, this KIP will be a further improvement based on that.
> Does that make sense?
>
> Actually, I don't strongly insist in doing this, if you still think they
> should be proposed together, I can update the KIP, too.
>
> Thank you.
> Luke
>
> On Fri, May 12, 2023 at 4:48 PM David Jacot <[email protected]>
> wrote:
>
> > Hi Luke,
> >
> > I disagree with this because we don't need the leader election change on
> > its own if we don't do this KIP. They have to go together or not at all
> in
> > my opinion. We need a KIP which designs the entire solution.
> >
> > Best,
> > David
> >
> > On Fri, May 12, 2023 at 10:33 AM Luke Chen <[email protected]> wrote:
> >
> > > Hi Alexandre,
> > >
> > > Thanks for the thoughts.
> > > I've thought about it, and think I would choose to have a new leader
> > > election method to fix the problem we encountered, not this
> "backup-only"
> > > replica solution.
> > > But this is still an interesting idea. Like what you've said, this
> > solution
> > > can bring many benefits.
> > > So maybe you can create a proposal for it?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Fri, May 12, 2023 at 4:21 PM Luke Chen <[email protected]> wrote:
> > >
> > > > Hi Haruki,
> > > >
> > > > Yes, this scenario could happen.
> > > > I'm thinking we can fix it in step 6, when controller tried to get
> LEO
> > > > from B,C replicas, the B,C replica should stop fetcher for this
> > partition
> > > > immediately, before returning the LEO.
> > > > About if we need quorum-based or not, We can discuss in another KIP.
> > I'm
> > > > still thinking about it.
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > >
> > > > On Fri, May 12, 2023 at 3:59 PM Luke Chen <[email protected]> wrote:
> > > >
> > > >> Hi David,
> > > >>
> > > >> > It can't be in another KIP as it is required for your proposal to
> > > work.
> > > >> This is also an important part to discuss as it requires the
> > controller
> > > to
> > > >> do more operations on leader changes.
> > > >>
> > > >> Yes, I know this is a requirement for this KIP to work, and need a
> lot
> > > of
> > > >> discussion.
> > > >> So that's why I think it'd be better to have a separate KIP to write
> > the
> > > >> content and discussion.
> > > >> I've put the status of this KIP as "pending" and added a note on the
> > top
> > > >> of this KIP:
> > > >>
> > > >> Note: This KIP requires leader election change, which will be
> proposed
> > > in
> > > >> another KIP.
> > > >>
> > > >> Thanks.
> > > >> Luke
> > > >>
> > > >> On Thu, May 11, 2023 at 11:43 PM Alexandre Dupriez <
> > > >> [email protected]> wrote:
> > > >>
> > > >>> Hi, Luke,
> > > >>>
> > > >>> Thanks for your reply.
> > > >>>
> > > >>> 102. Whether such a replica could become a leader depends on what
> the
> > > >>> end-user wants to use it for and what tradeoffs they wish to make
> > down
> > > >>> the line.
> > > >>>
> > > >>> There are cases, for instance with heterogeneous or interregional
> > > >>> networks, where the difference in latency between subsets of
> brokers
> > > >>> can be high enough for the "slow replicas" to have a detrimental
> > > >>> impact on the ISR traffic they take part in. This can justify
> > > >>> permanently segregating them from ISR traffic by design. And, an
> > > >>> end-user could still prefer to have these "slow replicas" versus
> > > >>> alternative approaches such as mirroring for the benefits they can
> > > >>> bring, for instance: a) they belong to the same cluster with no
> added
> > > >>> admin and ops, b) benefit from a direct, simpler replication path,
> c)
> > > >>> require less infrastructure than a mirrored solution, d) could
> become
> > > >>> unclean leaders for failovers under disaster scenarios such as a
> > > >>> regional service outages.
> > > >>>
> > > >>> Thanks,
> > > >>> Alexandre
> > > >>>
> > > >>> Le jeu. 11 mai 2023 à 14:57, Haruki Okada <[email protected]> a
> > > écrit
> > > >>> :
> > > >>> >
> > > >>> > Hi, Luke.
> > > >>> >
> > > >>> > Though this proposal definitely looks interesting, as others
> > pointed
> > > >>> out,
> > > >>> > the leader election implementation would be the hard part.
> > > >>> >
> > > >>> > And I think even LEO-based-election is not safe, which could
> cause
> > > >>> silent
> > > >>> > committed-data loss easily.
> > > >>> >
> > > >>> > Let's say we have replicas A,B,C and A is the leader initially,
> and
> > > >>> > min.insync.replicas = 2.
> > > >>> >
> > > >>> > - 1. Initial
> > > >>> >     * A(leo=0), B(leo=0), C(leo=0)
> > > >>> > - 2. Produce a message to A
> > > >>> >     * A(leo=1), B(leo=0), C(leo=0)
> > > >>> > - 3. Another producer produces a message to A (i.e. as the
> > different
> > > >>> batch)
> > > >>> >     * A(leo=2), B(leo=0), C(leo=0)
> > > >>> > - 4. C replicates the first batch. offset=1 is committed (by
> > > >>> > acks=min.insync.replicas)
> > > >>> >     * A(leo=2), B(leo=0), C(leo=1)
> > > >>> > - 5. A loses ZK session (or broker session timeout in KRaft)
> > > >>> > - 6. Controller (regardless ZK/KRaft) doesn't store LEO in
> itself,
> > so
> > > >>> it
> > > >>> > needs to interact with each replica. It detects C has the largest
> > LEO
> > > >>> and
> > > >>> > decided to elect C as the new leader
> > > >>> > - 7. Before leader-election is performed, B replicates offset=1,2
> > > from
> > > >>> A.
> > > >>> > offset=2 is committed
> > > >>> >     * This is possible because even if A lost ZK session, A could
> > > >>> handle
> > > >>> > fetch requests for a while.
> > > >>> > - 8. Controller elects C as the new leader. B truncates its
> offset.
> > > >>> > offset=2 is lost silently.
> > > >>> >
> > > >>> > I have a feeling that we need quorum-based data replication? as
> > Divij
> > > >>> > pointed out.
> > > >>> >
> > > >>> >
> > > >>> > 2023年5月11日(木) 22:33 David Jacot <[email protected]>:
> > > >>> >
> > > >>> > > Hi Luke,
> > > >>> > >
> > > >>> > > > Yes, on second thought, I think the new leader election is
> > > >>> required to
> > > >>> > > work
> > > >>> > > for this new acks option. I'll think about it and open another
> > KIP
> > > >>> for it.
> > > >>> > >
> > > >>> > > It can't be in another KIP as it is required for your proposal
> to
> > > >>> work.
> > > >>> > > This is also an important part to discuss as it requires the
> > > >>> controller to
> > > >>> > > do more operations on leader changes.
> > > >>> > >
> > > >>> > > Cheers,
> > > >>> > > David
> > > >>> > >
> > > >>> > > On Thu, May 11, 2023 at 2:44 PM Luke Chen <[email protected]>
> > > wrote:
> > > >>> > >
> > > >>> > > > Hi Ismael,
> > > >>> > > > Yes, on second thought, I think the new leader election is
> > > >>> required to
> > > >>> > > work
> > > >>> > > > for this new acks option. I'll think about it and open
> another
> > > KIP
> > > >>> for
> > > >>> > > it.
> > > >>> > > >
> > > >>> > > > Hi Divij,
> > > >>> > > > Yes, I agree with all of them. I'll think about it and let
> you
> > > >>> know how
> > > >>> > > we
> > > >>> > > > can work together.
> > > >>> > > >
> > > >>> > > > Hi Alexandre,
> > > >>> > > > > 100. The KIP makes one statement which may be considered
> > > >>> critical:
> > > >>> > > > "Note that in acks=min.insync.replicas case, the slow
> follower
> > > >>> might
> > > >>> > > > be easier to become out of sync than acks=all.". Would you
> have
> > > >>> some
> > > >>> > > > data on that behaviour when using the new ack semantic? It
> > would
> > > be
> > > >>> > > > interesting to analyse and especially look at the percentage
> of
> > > >>> time
> > > >>> > > > the number of replicas in ISR is reduced to the configured
> > > >>> > > > min.insync.replicas.
> > > >>> > > >
> > > >>> > > > The comparison data would be interesting. I can have a test
> > when
> > > >>> > > available.
> > > >>> > > > But this KIP will be deprioritized because there should be a
> > > >>> > > pre-requisite
> > > >>> > > > KIP for it.
> > > >>> > > >
> > > >>> > > > > A (perhaps naive) hypothesis would be that the
> > > >>> > > > new ack semantic indeed provides better produce latency, but
> at
> > > the
> > > >>> > > > cost of precipitating the slowest replica(s) out of the ISR?
> > > >>> > > >
> > > >>> > > > Yes, it could be.
> > > >>> > > >
> > > >>> > > > > 101. I understand the impact on produce latency, but I am
> not
> > > >>> sure
> > > >>> > > > about the impact on durability. Is your durability model
> built
> > > >>> against
> > > >>> > > > the replication factor or the number of min insync replicas?
> > > >>> > > >
> > > >>> > > > Yes, and also the new LEO-based leader election (not proposed
> > > yet).
> > > >>> > > >
> > > >>> > > > > 102. Could a new type of replica which would not be allowed
> > to
> > > >>> enter
> > > >>> > > > the ISR be an alternative? Such replica could attempt
> > replication
> > > >>> on a
> > > >>> > > > best-effort basis and would provide the permanent guarantee
> not
> > > to
> > > >>> > > > interfere with foreground traffic.
> > > >>> > > >
> > > >>> > > > You mean a backup replica, which will never become leader
> > > >>> (in-sync),
> > > >>> > > right?
> > > >>> > > > That's an interesting thought and might be able to become a
> > > >>> workaround
> > > >>> > > with
> > > >>> > > > the existing leader election. Let me think about it.
> > > >>> > > >
> > > >>> > > > Hi qiangLiu,
> > > >>> > > >
> > > >>> > > > > It's a good point that add this config and get better P99
> > > >>> latency, but
> > > >>> > > is
> > > >>> > > > this changing the meaning of "in sync replicas"? consider a
> > > >>> situation
> > > >>> > > with
> > > >>> > > > "replica=3 acks=2", when two broker fail and left only the
> > broker
> > > >>> that
> > > >>> > > > does't have the message, it is in sync, so will be elected as
> > > >>> leader,
> > > >>> > > will
> > > >>> > > > it cause a NOT NOTICED lost of acked messages?
> > > >>> > > >
> > > >>> > > > Yes, it will, so the `min.insync.replicas` config in the
> > > >>> broker/topic
> > > >>> > > level
> > > >>> > > > should be set correctly. In your example, it should be set to
> > 2,
> > > >>> so that
> > > >>> > > > when 2 replicas down, no new message write will be
> successful.
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > Thank you.
> > > >>> > > > Luke
> > > >>> > > >
> > > >>> > > >
> > > >>> > > > On Thu, May 11, 2023 at 12:16 PM 67 <[email protected]> wrote:
> > > >>> > > >
> > > >>> > > > > Hi Luke,
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > It's a good point that add this config and get better P99
> > > >>> latency, but
> > > >>> > > is
> > > >>> > > > > this changing the meaning of "in sync replicas"? consider a
> > > >>> situation
> > > >>> > > > with
> > > >>> > > > > "replica=3 acks=2", when two broker fail and left only the
> > > >>> broker that
> > > >>> > > > > does't have the message, it is in sync, so will be elected
> as
> > > >>> leader,
> > > >>> > > > will
> > > >>> > > > > it cause a *NOT NOTICED* lost of acked messages?
> > > >>> > > > >
> > > >>> > > > > qiangLiu
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > 在2023年05月10 12时58分，"Ismael Juma"<[email protected]>写道：
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > Hi Luke,
> > > >>> > > > >
> > > >>> > > > > As discussed in the other KIP, there are some subtleties
> when
> > > it
> > > >>> comes
> > > >>> > > to
> > > >>> > > > > the semantics of the system if we don't wait for all
> members
> > of
> > > >>> the isr
> > > >>> > > > > before we ack. I don't understand why you say the leader
> > > election
> > > >>> > > > question
> > > >>> > > > > is out of scope - it seems to be a core aspect to me.
> > > >>> > > > >
> > > >>> > > > > Ismael
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > > On Wed, May 10, 2023, 8:50 AM Luke Chen <[email protected]
> >
> > > >>> wrote:
> > > >>> > > > >
> > > >>> > > > > > Hi Ismael,
> > > >>> > > > > >
> > > >>> > > > > > No, I didn't know about this similar KIP! I hope I've
> known
> > > >>> that so
> > > >>> > > > that I
> > > >>> > > > > > don't need to spend time to write it again! :(
> > > >>> > > > > > I checked the KIP and all the discussions (here
> > > >>> > > > > > <
> > > >>>
> https://lists.apache.org/[email protected]:gte=100d:KIP-250
> > > >>> > > >).
> > > >>> > > > I
> > > >>> > > > > > think the consensus is that adding a client config to
> > > >>> `acks=quorum`
> > > >>> > > is
> > > >>> > > > > > fine.
> > > >>> > > > > > This comment
> > > >>> > > > > > <
> > > >>> https://lists.apache.org/thread/p77pym5sxpn91r8j364kmmf3qp5g65rn>
> > > >>> > > > from
> > > >>> > > > > > Guozhang pretty much concluded what I'm trying to do.
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > *1. Add one more value to client-side acks config:   0:
> no
> > > acks
> > > >>> > > needed
> > > >>> > > > at
> > > >>> > > > > > all.   1: ack from the leader.   all: ack from ALL the
> ISR
> > > >>> replicas
> > > >>> > > > > >  quorum: this is the new value, it requires ack from
> enough
> > > >>> number of
> > > >>> > > > ISR
> > > >>> > > > > > replicas no smaller than majority of the replicas AND no
> > > >>> smaller
> > > >>> > > > > > than{min.isr}.2. Clarify in the docs that if a user wants
> > to
> > > >>> > > tolerate X
> > > >>> > > > > > failures, she needs to set client acks=all or acks=quorum
> > > >>> (better
> > > >>> > > tail
> > > >>> > > > > > latency than "all") with broker {min.sir} to be X+1;
> > however,
> > > >>> "all"
> > > >>> > > is
> > > >>> > > > not
> > > >>> > > > > > necessarily stronger than "quorum".*
> > > >>> > > > > >
> > > >>> > > > > > Concerns from KIP-250 are:
> > > >>> > > > > > 1. Introducing a new leader LEO based election method.
> This
> > > is
> > > >>> not
> > > >>> > > > clear in
> > > >>> > > > > > the KIP-250 and needs more discussion
> > > >>> > > > > > 2. The KIP-250 also tried to optimize the consumer
> latency
> > to
> > > >>> read
> > > >>> > > > messages
> > > >>> > > > > > beyond high watermark, which also has some discussion
> about
> > > >>> how to
> > > >>> > > > achieve
> > > >>> > > > > > that, and no conclusion
> > > >>> > > > > >
> > > >>> > > > > > Both of the above 2 concerns are out of the scope of my
> > > >>> current KIP.
> > > >>> > > > > > So, I think it's good to provide this `acks=quorum` or
> > > >>> > > > > > `acks=min.insync.replicas` option to users to give them
> > > another
> > > >>> > > choice.
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > Thank you.
> > > >>> > > > > > Luke
> > > >>> > > > > >
> > > >>> > > > > >
> > > >>> > > > > > On Wed, May 10, 2023 at 8:54 AM Ismael Juma <
> > > [email protected]
> > > >>> >
> > > >>> > > wrote:
> > > >>> > > > > >
> > > >>> > > > > > > Hi Luke,
> > > >>> > > > > > >
> > > >>> > > > > > > Are you aware of
> > > >>> > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > >
> > > >>> > >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-250+Add+Support+for+Quorum-based+Producer+Acknowledgment
> > > >>> > > > > > > ?
> > > >>> > > > > > >
> > > >>> > > > > > > Ismael
> > > >>> > > > > > >
> > > >>> > > > > > > On Tue, May 9, 2023 at 10:14 PM Luke Chen <
> > > [email protected]
> > > >>> >
> > > >>> > > wrote:
> > > >>> > > > > > >
> > > >>> > > > > > > > Hi all,
> > > >>> > > > > > > >
> > > >>> > > > > > > > I'd like to start a discussion for the KIP-926:
> > > introducing
> > > >>> > > > > > > > acks=min.insync.replicas config. This KIP is to
> > introduce
> > > >>> > > > > > > > `acks=min.insync.replicas` config value in producer,
> to
> > > >>> improve
> > > >>> > > the
> > > >>> > > > > > write
> > > >>> > > > > > > > throughput and still guarantee high durability.
> > > >>> > > > > > > >
> > > >>> > > > > > > > Please check the link for more detail:
> > > >>> > > > > > > >
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > >
> > > >>> > >
> > > >>>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-926%3A+introducing+acks%3Dmin.insync.replicas+config
> > > >>> > > > > > > >
> > > >>> > > > > > > > Any feedback is welcome.
> > > >>> > > > > > > >
> > > >>> > > > > > > > Thank you.
> > > >>> > > > > > > > Luke
> > > >>> > > > > > > >
> > > >>> > > > > > >
> > > >>> > > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > > >
> > > >>> > > >
> > > >>> > >
> > > >>> >
> > > >>> >
> > > >>> > --
> > > >>> > ========================
> > > >>> > Okada Haruki
> > > >>> > [email protected]
> > > >>> > ========================
> > > >>>
> > > >>
> > >
> >
>

Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

Reply via email to