Re: [DISCUSS] KIP-134: Delay initial consumer group rebalance

Damian Guy Fri, 24 Mar 2017 05:31:24 -0700

Thanks for the feedback.

Ewen: I'm happy to make it a client side config. Other than the protocol
bump i think the effort is almost the same. Personally i see no other
issues, but based on discussions with others this is what we came up with.


True, it can probably be tested easily via an integration test.

Matthias: Yes i agree, the delay could be extended as each new member joins
the group.

Thanks,
Damian

On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <e...@confluent.io>
wrote:

> I have the same initial response as Ismael re: broker vs consumer settings.
> The global setting seems questionable.
>
> Could we maybe summarize what the impact of making this a client config
> would be? Protocol bump is obvious, but is there any other significant
> issue? For the protocol bump in particular, I think this change is
> currently really critical for streams; it will be valuable elsewhere, but
> the immediate demand is streams, so a protocol bump while being backwards
> compatible wouldn't affect any other clients. Is this still actually
> compatible with different clients given that they would now expect
> different timeouts? (I think it's strictly compatible if you wait for
> responses, but if you enforce any client side timeouts, I'm not so sure.)
>
> re: test plan, I'm sure this will come as a surprise, but is the system
> test even necessary? Validating # of rebalances seems messy as other things
> can cause rebalances (though admittedly not in a "clean" case). But really
> it seems like an integration test could validate this by making sure only 1
> rebalance occurred when 2 members joined with a sufficient time gap.
>
> -Ewen
>
> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> > Thanks for the KIP Damian!
> >
> > My two cents:
> >
> >  - we should have an explicit parameter for this -- implicit setting are
> > always tricky (the "importance" of this parameter would be LOW)
> >
> >  - the config should be different for each consumer group:
> >    * assume you have a stateless app, you want to rebalance immediately
> >    * if you start-up in an visualized environment using some tools like
> > Mesos you might need a different value that on bare metal (no VM to be
> > started)
> >    * it also depends, how many consumer instanced you expect -- it's
> > harder to start up 100 instances in 3 seconds than 5
> >
> >  - the default value should be zero
> >
> >
> > One more thought: what about scaling scenarios? If a consumer group has
> > 10 instanced and should be scaled up to 20, it would make sense to do
> > this with a single rebalance, too. Thus, I am wondering, if it would
> > make sense to apply this delay each time a new consumer joins group,
> > even if the group is not empty?
> >
> >
> > -Matthias
> >
> >
> > On 3/23/17 10:19 AM, Damian Guy wrote:
> > > Thanks Gouzhang - i think another problem with this is that is
> > overloading
> > > session.timeout.ms to mean multiple things. I'm not sure that is a
> good
> > > thing.
> > >
> > > On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <wangg...@gmail.com> wrote:
> > >
> > >> The downside of it, though, is that although it "hides" this from most
> > of
> > >> the users needing to be aware of it, by default session timeout i.e.
> the
> > >> rebalance timeout is 10 seconds which could arguably too long.
> > >>
> > >>
> > >> Guozhang
> > >>
> > >> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <wangg...@gmail.com>
> > >> wrote:
> > >>
> > >>> Just throwing another alternative idea here: we can consider using
> the
> > >>> rebalance timeout value which is already included in the join request
> > >>> protocol (and on the current Java client it is always written as the
> > >>> session timeout value), that the first member joining will always
> force
> > >> the
> > >>> coordinator to wait that long. By doing this we do not need to bump
> up
> > >> the
> > >>> protocol either.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <damian....@gmail.com>
> > >> wrote:
> > >>>
> > >>>> Hi Ismael,
> > >>>>
> > >>>> Mostly to avoid the protocol bump.
> > >>>>
> > >>>> I agree that it may be difficult to choose the right delay for all
> > >>>> consumer
> > >>>> groups, but we wanted to make this something that most users don't
> > >> really
> > >>>> need to think about, i.e., a small enough default delay that works
> in
> > >> the
> > >>>> majority of cases. However it would be much more flexible as a
> > consumer
> > >>>> config, which i'm happy to pursue if this change is worthy of a
> > protocol
> > >>>> bump.
> > >>>>
> > >>>> Thanks,
> > >>>> Damian
> > >>>>
> > >>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <ism...@juma.me.uk> wrote:
> > >>>>
> > >>>>> Thanks for the KIP, Damian. It makes sense to avoid multiple
> > >> rebalances
> > >>>>> during start-up. One issue with having this as a broker config is
> > that
> > >>>> it
> > >>>>> may be difficult to choose the right delay for all consumer groups.
> > >> Can
> > >>>> you
> > >>>>> elaborate a little more on why the first alternative (add a
> consumer
> > >>>>> config) was rejected? We bump protocol versions regularly (when it
> > >> makes
> > >>>>> sense), so it would be good to get a bit more detail.
> > >>>>>
> > >>>>> Thanks,
> > >>>>> Ismael
> > >>>>>
> > >>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <damian....@gmail.com
> >
> > >>>> wrote:
> > >>>>>
> > >>>>>> Hi All,
> > >>>>>>
> > >>>>>> I've prepared a KIP to add a configurable delay to the initial
> > >>>> consumer
> > >>>>>> group rebalance.
> > >>>>>>
> > >>>>>> Please have look here:
> > >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > >>>>>> 134%3A+Delay+initial+consumer+group+rebalance
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>> Damian
> > >>>>>>
> > >>>>>> BTW, i apologize if this appears twice. Seems the first one may
> have
> > >>>> not
> > >>>>>> made it.
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> -- Guozhang
> > >>>
> > >>
> > >>
> > >>
> > >> --
> > >> -- Guozhang
> > >>
> > >
> >
> >
>

Re: [DISCUSS] KIP-134: Delay initial consumer group rebalance

Reply via email to