Hi all,

@Sophie - I like the sound of the dual-protocol default. The smooth upgrade
path it permits sounds fantastic!

@Luke - Do you think we can also include Connect in this KIP? Right now we
don't set any custom partition assignment strategies for the consumer
groups we bring up for sink tasks, and if we continue to just use the
default, the assignment strategy for those consumer groups would change on
Connect clusters once people upgrade to 3.0. I think this is fine (assuming
we can take care of https://issues.apache.org/jira/browse/KAFKA-12487
before then, which I'm fairly optimistic about), but it might be worth a
sentence or two in the KIP explaining that the change in default will
intentionally propagate to Connect. And, if we think Connect should be left
out of this change and stay on the range assignor instead, we should
probably call that fact out in the KIP as well and state that Connect will
now override the default partition assignment strategy to be the range
assignor (assuming the user hasn't specified a value for
consumer.partition.assignment.strategy in their worker config or for
consumer.override.partition.assignment.strategy in their connector config).

Cheers,

Chris

On Wed, Mar 31, 2021 at 12:18 AM Sophie Blee-Goldman
<sop...@confluent.io.invalid> wrote:

> Ok I'm still fleshing out all the details of KAFKA-12477 but I think we can
> simplify some things a bit, and avoid
> any kind of "fail-fast" which will require user intervention. In fact I
> think we can avoid requiring the user to make
> any changes at all for KIP-726, so we don't have to worry about whether
> they actually read our documentation:
>
> Instead of making ["cooperative-sticky"] the default, we change the default
> to ["cooperative-sticky", "range"].
> Since "range" is the old default, this is equivalent to the first rolling
> bounce of the safe upgrade path in KIP-429.
>
> Of course this also means that under the current protocol selection
> mechanism we won't actually upgrade to
> cooperative rebalancing with the default assignor. But that's where
> KAFKA-12477 will come in.
>
> @Guozhang Wang <guozh...@confluent.io>  I'll get back to you with a
> concrete proposal and answer your questions, I just want to point out
> that it's possible to side-step the risk of users shooting themselves in
> the foot (well, at least in this one specific case,
> obviously they always find a way)
>
> On Tue, Mar 30, 2021 at 10:37 AM Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Sophie,
> >
> > My question is more related to KAFKA-12477, but since your latest replies
> > are on this thread I figured we can follow-up on the same venue. Just so
> I
> > understand your latest comments above about the approach:
> >
> > * I think, we would need to persist this decision so that the group would
> > never go back to the eager protocol, this bit would be written to the
> > internal topic's assignment message. Is that correct?
> > * Maybe you can describe the steps, after the group has decided to move
> > forward with cooperative protocols, when:
> > 1) a new member joined the group with the old version, and hence only
> > recognized eager protocol and executing the eager protocol with its first
> > rebalance, what would happen.
> > 2) in addition to 1), the new member joined the group with the old
> version
> > and only recognized the old subscription format, and was selected as the
> > leader, what would happen.
> >
> > Guozhang
> >
> >
> >
> >
> > On Mon, Mar 29, 2021 at 10:30 PM Luke Chen <show...@gmail.com> wrote:
> >
> > > Hi Sophie & Ismael,
> > > Thank you for your feedback.
> > > No problem, let's pause this KIP and wait for this improvement:
> > KAFKA-12477
> > > <https://issues.apache.org/jira/browse/KAFKA-12477>.
> > >
> > > Stay tuned :)
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Tue, Mar 30, 2021 at 3:14 AM Ismael Juma <ism...@juma.me.uk> wrote:
> > >
> > > > Hi Sophie,
> > > >
> > > > I didn't analyze the KIP in detail, but the two suggestions you
> > mentioned
> > > > sound like great improvements.
> > > >
> > > > A bit more context: breaking changes for a widely used product like
> > Kafka
> > > > are costly and hence why we try as hard as we can to avoid them. When
> > it
> > > > comes to the brokers, they are often managed by a central group (or
> > > they're
> > > > in the Cloud), so they're a bit easier to manage. Even so, it's still
> > > > possible to upgrade from 0.8.x directly to 2.7 since all protocol
> > > versions
> > > > are still supported. When it comes to the basic clients (producer,
> > > > consumer, admin client), they're often embedded in applications so we
> > > have
> > > > to be even more conservative.
> > > >
> > > > Ismael
> > > >
> > > > On Mon, Mar 29, 2021 at 10:50 AM Sophie Blee-Goldman
> > > > <sop...@confluent.io.invalid> wrote:
> > > >
> > > > > Ismael,
> > > > >
> > > > > It seems like given 3.0 is a breaking release, we have to rely on
> > users
> > > > > being aware of this and responsible
> > > > > enough to read the upgrade guide. Otherwise we could never ever
> make
> > > any
> > > > > breaking changes beyond just
> > > > > removing deprecated APIs or other compilation-breaking errors that
> > > would
> > > > be
> > > > > immediately visible, no?
> > > > >
> > > > > That said, obviously it's better to have a circuit-breaker that
> will
> > > fail
> > > > > fast in case of a user misconfiguration
> > > > > rather than silently corrupting the consumer group state -- eg for
> > two
> > > > > consumers to overlap in their ownership
> > > > > of the same partition(s). We could definitely implement this, and
> now
> > > > that
> > > > > I think about it this might solve a
> > > > > related problem in KAFKA-12477
> > > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. We just add a
> > new
> > > > > field to the Assignment in which the group leader
> > > > > indicates whether it's on a recent enough version to understand
> > > > cooperative
> > > > > rebalancing. If an upgraded member
> > > > > joins the group, it'll only be allowed to start following the new
> > > > > rebalancing protocol after receiving the go-ahead
> > > > > from the group leader.
> > > > >
> > > > > If we do go ahead and add this new field in the Assignment then I'm
> > > > pretty
> > > > > confident we can reduce the number
> > > > > of required rolling bounces to just one with KAFKA-12477
> > > > > <https://issues.apache.org/jira/browse/KAFKA-12477>. In that case
> we
> > > > > should
> > > > > be in much better shape to
> > > > > feel good about changing the default to the
> > CooperativeStickyAssignor.
> > > > How
> > > > > does that sound?
> > > > >
> > > > > To be clear, I'm not proposing we do this as part of KIP-726.
> Here's
> > my
> > > > > take:
> > > > >
> > > > > Let's pause this KIP while I work on making these two improvements
> in
> > > > > KAFKA-12477 <https://issues.apache.org/jira/browse/KAFKA-12477>.
> > Once
> > > I
> > > > > can
> > > > > confirm the
> > > > > short-circuit and single rolling bounce will be available for 3.0,
> > I'll
> > > > > report back on this thread. Then we can move
> > > > > forward with this KIP again.
> > > > >
> > > > > Thoughts?
> > > > > Sophie
> > > > >
> > > > > On Mon, Mar 29, 2021 at 12:01 AM Luke Chen <show...@gmail.com>
> > wrote:
> > > > >
> > > > > > Hi Ismael,
> > > > > > Thanks for your good question. Answer them below:
> > > > > > *1. Are we saying that every consumer upgraded would have to
> follow
> > > the
> > > > > > complex path described in the KIP? *
> > > > > > --> We suggest that every consumer did these 2 steps of rolling
> > > > upgrade.
> > > > > > And after KAFKA-12477 <
> > > > https://issues.apache.org/jira/browse/KAFKA-12477
> > > > > >
> > > > > > is completed, it can be reduced to 1 rolling upgrade.
> > > > > >
> > > > > > *2. what happens if they don't read the instructions and upgrade
> as
> > > > they
> > > > > > have in the past?*
> > > > > > --> The reason we want 2 steps of rolling upgrade is that we want
> > to
> > > > > avoid
> > > > > > the situation where leader is on old byte-code and only recognize
> > > > > "eager",
> > > > > > but due to compatibility would still be able to deserialize the
> new
> > > > > > protocol data from newer versioned members, and hence just go
> ahead
> > > and
> > > > > do
> > > > > > the assignment while new versioned members did not revoke their
> > > > > partitions
> > > > > > before joining the group.
> > > > > >
> > > > > > But I'd say, the new default assignor "CooperativeStickyAssignor"
> > was
> > > > > > already introduced in V2.4.0, and it should be long enough for
> user
> > > to
> > > > > > upgrade to the new byte-code to recognize the "cooperative"
> > protocol.
> > > > > >
> > > > > > What do you think?
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Mon, Mar 29, 2021 at 12:14 PM Ismael Juma <ism...@juma.me.uk>
> > > > wrote:
> > > > > >
> > > > > > > Thanks for the KIP. Are we saying that every consumer upgraded
> > > would
> > > > > have
> > > > > > > to follow the complex path described in the KIP? Also, what
> > happens
> > > > if
> > > > > > they
> > > > > > > don't read the instructions and upgrade as they have in the
> past?
> > > > > > >
> > > > > > > Ismael
> > > > > > >
> > > > > > > On Fri, Mar 26, 2021, 1:53 AM Luke Chen <show...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > > <Update the subject>
> > > > > > > >
> > > > > > > > I'd like to discuss the following proposal to make the
> > > > > > > > CooperativeStickyAssignor as the default assignor.
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-726%3A+Make+the+CooperativeStickyAssignor+as+the+default+assignor
> > > > > > > >
> > > > > > > > Any comments are welcomed.
> > > > > > > >
> > > > > > > > Thank you.
> > > > > > > > Luke
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>

Reply via email to