Thanks for the feedback. Ewen: I'm happy to make it a client side config. Other than the protocol bump i think the effort is almost the same. Personally i see no other issues, but based on discussions with others this is what we came up with.
True, it can probably be tested easily via an integration test. Matthias: Yes i agree, the delay could be extended as each new member joins the group. Thanks, Damian On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <e...@confluent.io> wrote: > I have the same initial response as Ismael re: broker vs consumer settings. > The global setting seems questionable. > > Could we maybe summarize what the impact of making this a client config > would be? Protocol bump is obvious, but is there any other significant > issue? For the protocol bump in particular, I think this change is > currently really critical for streams; it will be valuable elsewhere, but > the immediate demand is streams, so a protocol bump while being backwards > compatible wouldn't affect any other clients. Is this still actually > compatible with different clients given that they would now expect > different timeouts? (I think it's strictly compatible if you wait for > responses, but if you enforce any client side timeouts, I'm not so sure.) > > re: test plan, I'm sure this will come as a surprise, but is the system > test even necessary? Validating # of rebalances seems messy as other things > can cause rebalances (though admittedly not in a "clean" case). But really > it seems like an integration test could validate this by making sure only 1 > rebalance occurred when 2 members joined with a sufficient time gap. > > -Ewen > > On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <matth...@confluent.io> > wrote: > > > Thanks for the KIP Damian! > > > > My two cents: > > > > - we should have an explicit parameter for this -- implicit setting are > > always tricky (the "importance" of this parameter would be LOW) > > > > - the config should be different for each consumer group: > > * assume you have a stateless app, you want to rebalance immediately > > * if you start-up in an visualized environment using some tools like > > Mesos you might need a different value that on bare metal (no VM to be > > started) > > * it also depends, how many consumer instanced you expect -- it's > > harder to start up 100 instances in 3 seconds than 5 > > > > - the default value should be zero > > > > > > One more thought: what about scaling scenarios? If a consumer group has > > 10 instanced and should be scaled up to 20, it would make sense to do > > this with a single rebalance, too. Thus, I am wondering, if it would > > make sense to apply this delay each time a new consumer joins group, > > even if the group is not empty? > > > > > > -Matthias > > > > > > On 3/23/17 10:19 AM, Damian Guy wrote: > > > Thanks Gouzhang - i think another problem with this is that is > > overloading > > > session.timeout.ms to mean multiple things. I'm not sure that is a > good > > > thing. > > > > > > On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <wangg...@gmail.com> wrote: > > > > > >> The downside of it, though, is that although it "hides" this from most > > of > > >> the users needing to be aware of it, by default session timeout i.e. > the > > >> rebalance timeout is 10 seconds which could arguably too long. > > >> > > >> > > >> Guozhang > > >> > > >> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <wangg...@gmail.com> > > >> wrote: > > >> > > >>> Just throwing another alternative idea here: we can consider using > the > > >>> rebalance timeout value which is already included in the join request > > >>> protocol (and on the current Java client it is always written as the > > >>> session timeout value), that the first member joining will always > force > > >> the > > >>> coordinator to wait that long. By doing this we do not need to bump > up > > >> the > > >>> protocol either. > > >>> > > >>> > > >>> Guozhang > > >>> > > >>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <damian....@gmail.com> > > >> wrote: > > >>> > > >>>> Hi Ismael, > > >>>> > > >>>> Mostly to avoid the protocol bump. > > >>>> > > >>>> I agree that it may be difficult to choose the right delay for all > > >>>> consumer > > >>>> groups, but we wanted to make this something that most users don't > > >> really > > >>>> need to think about, i.e., a small enough default delay that works > in > > >> the > > >>>> majority of cases. However it would be much more flexible as a > > consumer > > >>>> config, which i'm happy to pursue if this change is worthy of a > > protocol > > >>>> bump. > > >>>> > > >>>> Thanks, > > >>>> Damian > > >>>> > > >>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <ism...@juma.me.uk> wrote: > > >>>> > > >>>>> Thanks for the KIP, Damian. It makes sense to avoid multiple > > >> rebalances > > >>>>> during start-up. One issue with having this as a broker config is > > that > > >>>> it > > >>>>> may be difficult to choose the right delay for all consumer groups. > > >> Can > > >>>> you > > >>>>> elaborate a little more on why the first alternative (add a > consumer > > >>>>> config) was rejected? We bump protocol versions regularly (when it > > >> makes > > >>>>> sense), so it would be good to get a bit more detail. > > >>>>> > > >>>>> Thanks, > > >>>>> Ismael > > >>>>> > > >>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <damian....@gmail.com > > > > >>>> wrote: > > >>>>> > > >>>>>> Hi All, > > >>>>>> > > >>>>>> I've prepared a KIP to add a configurable delay to the initial > > >>>> consumer > > >>>>>> group rebalance. > > >>>>>> > > >>>>>> Please have look here: > > >>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > >>>>>> 134%3A+Delay+initial+consumer+group+rebalance > > >>>>>> > > >>>>>> Thanks, > > >>>>>> Damian > > >>>>>> > > >>>>>> BTW, i apologize if this appears twice. Seems the first one may > have > > >>>> not > > >>>>>> made it. > > >>>>>> > > >>>>> > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> -- Guozhang > > >>> > > >> > > >> > > >> > > >> -- > > >> -- Guozhang > > >> > > > > > > > >