@Becket:

I am not sure, if I understand this correctly. Instead of applying a
fixed delay, that starts when the first consumer of an (empty) group
joins, you suggest to re-trigger/re-set the delay each time a new
consumer joins?

This sound like a good strategy to me, if the config is on the broker side.

@Eno:

I think that's a valid point and I like this idea!


-Matthias


On 3/24/17 1:23 PM, Eno Thereska wrote:
> Thanks Damian,
> 
> This KIP deals with the initial phase only. What about the cases when several 
> consumers leave a group? Won't there be several expensive rebalances then as 
> well? I'm wondering if it makes sense for the delay to hold anytime the "set" 
> of consumers in a group changes, be it addition to the group or removal from 
> group.
> 
> Thanks
> Eno
> 
> 
>> On 24 Mar 2017, at 20:04, Becket Qin <[email protected]> wrote:
>>
>> Thanks for the KIP, Damian.
>>
>> My two cents on this. It seems there are two things worth thinking here:
>>
>> 1. Better rebalance timing. We will try to rebalance only when all the
>> consumers in a group have joined. The challenge would be someone has to
>> define what does ALL consumers mean, it could either be a time or number of
>> consumers, etc.
>>
>> 2. Avoid frequent rebalance. For example, if there are 100 consumers in a
>> group, today, in the worst case, we may end up with 100 rebalances even if
>> all the consumers joined the group in a reasonably small amount of time.
>> Frequent rebalance is also a bad thing for brokers.
>>
>> Having a client side configuration may solve problem 1 better because each
>> consumer group can potentially configure their own timing. However, it does
>> not really prevent frequent rebalance in general because some of the
>> consumers can be misconfigured. (This may have something to do with KIP-124
>> as well. But if quota is applied on the JoinGroup/SyncGroup request it may
>> cause some unwanted cascading effects.)
>>
>> Having a broker side configuration may result in less flexibility for each
>> consumer group, but it can prevent frequent rebalance better. I think with
>> some reasonable design, the rebalance timing issue can be resolved on the
>> broker side as well. Matthias had a good point on extending the delay when
>> a new consumer joins a group (we actually did something similar to batch
>> ISR change propagation). For example, let's say on the broker side, we will
>> always delay 2 seconds each time we see a new consumer joining a consumer
>> group. This would probably work for most of the consumer groups and will
>> also limit the rebalance frequency to protect the brokers.
>>
>> I am not sure about the streams use case here, but if something like 2
>> seconds of delay is acceptable for streams, I would prefer adding the
>> configuration to the broker so that we can address both problems.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>>
>> On Fri, Mar 24, 2017 at 5:30 AM, Damian Guy <[email protected]> wrote:
>>
>>> Thanks for the feedback.
>>>
>>> Ewen: I'm happy to make it a client side config. Other than the protocol
>>> bump i think the effort is almost the same. Personally i see no other
>>> issues, but based on discussions with others this is what we came up with.
>>>
>>> True, it can probably be tested easily via an integration test.
>>>
>>> Matthias: Yes i agree, the delay could be extended as each new member joins
>>> the group.
>>>
>>> Thanks,
>>> Damian
>>>
>>> On Fri, 24 Mar 2017 at 05:14 Ewen Cheslack-Postava <[email protected]>
>>> wrote:
>>>
>>>> I have the same initial response as Ismael re: broker vs consumer
>>> settings.
>>>> The global setting seems questionable.
>>>>
>>>> Could we maybe summarize what the impact of making this a client config
>>>> would be? Protocol bump is obvious, but is there any other significant
>>>> issue? For the protocol bump in particular, I think this change is
>>>> currently really critical for streams; it will be valuable elsewhere, but
>>>> the immediate demand is streams, so a protocol bump while being backwards
>>>> compatible wouldn't affect any other clients. Is this still actually
>>>> compatible with different clients given that they would now expect
>>>> different timeouts? (I think it's strictly compatible if you wait for
>>>> responses, but if you enforce any client side timeouts, I'm not so sure.)
>>>>
>>>> re: test plan, I'm sure this will come as a surprise, but is the system
>>>> test even necessary? Validating # of rebalances seems messy as other
>>> things
>>>> can cause rebalances (though admittedly not in a "clean" case). But
>>> really
>>>> it seems like an integration test could validate this by making sure
>>> only 1
>>>> rebalance occurred when 2 members joined with a sufficient time gap.
>>>>
>>>> -Ewen
>>>>
>>>> On Thu, Mar 23, 2017 at 3:53 PM, Matthias J. Sax <[email protected]>
>>>> wrote:
>>>>
>>>>> Thanks for the KIP Damian!
>>>>>
>>>>> My two cents:
>>>>>
>>>>> - we should have an explicit parameter for this -- implicit setting
>>> are
>>>>> always tricky (the "importance" of this parameter would be LOW)
>>>>>
>>>>> - the config should be different for each consumer group:
>>>>>   * assume you have a stateless app, you want to rebalance immediately
>>>>>   * if you start-up in an visualized environment using some tools like
>>>>> Mesos you might need a different value that on bare metal (no VM to be
>>>>> started)
>>>>>   * it also depends, how many consumer instanced you expect -- it's
>>>>> harder to start up 100 instances in 3 seconds than 5
>>>>>
>>>>> - the default value should be zero
>>>>>
>>>>>
>>>>> One more thought: what about scaling scenarios? If a consumer group has
>>>>> 10 instanced and should be scaled up to 20, it would make sense to do
>>>>> this with a single rebalance, too. Thus, I am wondering, if it would
>>>>> make sense to apply this delay each time a new consumer joins group,
>>>>> even if the group is not empty?
>>>>>
>>>>>
>>>>> -Matthias
>>>>>
>>>>>
>>>>> On 3/23/17 10:19 AM, Damian Guy wrote:
>>>>>> Thanks Gouzhang - i think another problem with this is that is
>>>>> overloading
>>>>>> session.timeout.ms to mean multiple things. I'm not sure that is a
>>>> good
>>>>>> thing.
>>>>>>
>>>>>> On Thu, 23 Mar 2017 at 17:14 Guozhang Wang <[email protected]>
>>> wrote:
>>>>>>
>>>>>>> The downside of it, though, is that although it "hides" this from
>>> most
>>>>> of
>>>>>>> the users needing to be aware of it, by default session timeout i.e.
>>>> the
>>>>>>> rebalance timeout is 10 seconds which could arguably too long.
>>>>>>>
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Thu, Mar 23, 2017 at 10:12 AM, Guozhang Wang <[email protected]
>>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just throwing another alternative idea here: we can consider using
>>>> the
>>>>>>>> rebalance timeout value which is already included in the join
>>> request
>>>>>>>> protocol (and on the current Java client it is always written as
>>> the
>>>>>>>> session timeout value), that the first member joining will always
>>>> force
>>>>>>> the
>>>>>>>> coordinator to wait that long. By doing this we do not need to bump
>>>> up
>>>>>>> the
>>>>>>>> protocol either.
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>> On Thu, Mar 23, 2017 at 5:49 AM, Damian Guy <[email protected]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Ismael,
>>>>>>>>>
>>>>>>>>> Mostly to avoid the protocol bump.
>>>>>>>>>
>>>>>>>>> I agree that it may be difficult to choose the right delay for all
>>>>>>>>> consumer
>>>>>>>>> groups, but we wanted to make this something that most users don't
>>>>>>> really
>>>>>>>>> need to think about, i.e., a small enough default delay that works
>>>> in
>>>>>>> the
>>>>>>>>> majority of cases. However it would be much more flexible as a
>>>>> consumer
>>>>>>>>> config, which i'm happy to pursue if this change is worthy of a
>>>>> protocol
>>>>>>>>> bump.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Damian
>>>>>>>>>
>>>>>>>>> On Thu, 23 Mar 2017 at 12:35 Ismael Juma <[email protected]>
>>> wrote:
>>>>>>>>>
>>>>>>>>>> Thanks for the KIP, Damian. It makes sense to avoid multiple
>>>>>>> rebalances
>>>>>>>>>> during start-up. One issue with having this as a broker config is
>>>>> that
>>>>>>>>> it
>>>>>>>>>> may be difficult to choose the right delay for all consumer
>>> groups.
>>>>>>> Can
>>>>>>>>> you
>>>>>>>>>> elaborate a little more on why the first alternative (add a
>>>> consumer
>>>>>>>>>> config) was rejected? We bump protocol versions regularly (when
>>> it
>>>>>>> makes
>>>>>>>>>> sense), so it would be good to get a bit more detail.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> Ismael
>>>>>>>>>>
>>>>>>>>>> On Thu, Mar 23, 2017 at 12:24 PM, Damian Guy <
>>> [email protected]
>>>>>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi All,
>>>>>>>>>>>
>>>>>>>>>>> I've prepared a KIP to add a configurable delay to the initial
>>>>>>>>> consumer
>>>>>>>>>>> group rebalance.
>>>>>>>>>>>
>>>>>>>>>>> Please have look here:
>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>> 134%3A+Delay+initial+consumer+group+rebalance
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Damian
>>>>>>>>>>>
>>>>>>>>>>> BTW, i apologize if this appears twice. Seems the first one may
>>>> have
>>>>>>>>> not
>>>>>>>>>>> made it.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -- Guozhang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to