Hey Guozhang,

Usually I think such naming inconsistencies are best avoided. It adds
another level of confusion for people who have to dip into the code, figure
out a problem, and ultimately explain it. Since we already have the
PreparingRebalance state, maybe we could just rename the AwaitingSync state
to CompletingRebalance?

-Jason

On Thu, Aug 3, 2017 at 6:09 PM, Guozhang Wang <wangg...@gmail.com> wrote:

> From an ops person's view who are mostly likely watching the metrics these
> names may not be very clear as people may not know the internals well. I'd
> prefer PrepareRebalance and CompleteRebalance since they may be easier to
> understand thought not 100 percent accurately match to internal
> implementation.
>
>
> Guozhang
>
>
> On Thu, Aug 3, 2017 at 4:14 PM, Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hey Colin, Guozhang,
> >
> > I agree the current state names are not ideal for end users. I tend to
> see
> > the rebalance as joining the group and receiving the assignment. Maybe
> the
> > states could be named in those terms? For example: RebalanceJoin and
> > RebalanceAssignment. What do you think?
> >
> > -Jason
> >
> > On Fri, Jul 28, 2017 at 11:18 AM, Guozhang Wang <wangg...@gmail.com>
> > wrote:
> >
> > > I feel we can change `AwaitSync` to `completeRebalance` while keeping
> the
> > > other as is.
> > >
> > > cc Jason?
> > >
> > > Guozhang
> > >
> > > On Fri, Jul 28, 2017 at 10:08 AM, Colin McCabe <cmcc...@apache.org>
> > wrote:
> > >
> > >> Thanks for the explanation.  I guess maybe we should just keep the
> group
> > >> names as they are, then?
> > >>
> > >> best,
> > >> Colin
> > >>
> > >>
> > >> On Wed, Jul 26, 2017, at 11:25, Guozhang Wang wrote:
> > >> > To me `PreparingRebalance` sounds better than `StartingRebalance`
> > since
> > >> > only by the end of that stage we have formed a new group. More
> > >> > specifically, this this the workflow from the coordinator's point of
> > >> > view:
> > >> >
> > >> > 1. decided to trigger a rebalance, enter PreparingRebalance phase;
> > >> >                   |
> > >> >                   |   send out error code for all heartbeat reponses
> > >> >                  \|/
> > >> >                   |
> > >> >                   |   waiting for join group requests from members
> > >> >                  \|/
> > >> > 2. formed a new group, increment the generation number, now start
> > >> > rebalancing, entering AwaitSync phase:
> > >> >                   |
> > >> >                   |   send out the join group responses for whoever
> > >> > requested join
> > >> >                  \|/
> > >> >                   |
> > >> >                   |   waiting for the sync group request from the
> > leader
> > >> >                  \|/
> > >> > 3. received assignment from the leader; the rebalance has ended,
> start
> > >> > ticking for all members, entering Stable phase.
> > >> >                   |
> > >> >                   |   for whoever else sending the sync group
> request,
> > >> > reply with the assignment
> > >> >                  \|/
> > >> >
> > >> > So from the coordinator's point of view the rebalance starts at
> > >> beginning
> > >> > of step 2 and ends at beginning of step 3. Maybe we can rename
> > >> > `AwaitSync`
> > >> > itself to `CompletingRebalance`.
> > >> >
> > >> > Guozhang
> > >> >
> > >> >
> > >> >
> > >> > On Tue, Jul 25, 2017 at 6:44 AM, Ismael Juma <ism...@juma.me.uk>
> > wrote:
> > >> >
> > >> > > Hi Guozhang,
> > >> > >
> > >> > > Thanks for the clarification. The naming does seem a bit unclear.
> > >> Maybe
> > >> > > `PreparingRebalance` could be `StartingRebalance` or something
> that
> > >> makes
> > >> > > it clear that it is part of the rebalance instead of a step before
> > the
> > >> > > actual rebalance. `AwaitingSync` could also be
> > `CompletingRebalance`,
> > >> but
> > >> > > not sure if that's better.
> > >> > >
> > >> > > Ismael
> > >> > >
> > >> > > On Mon, Jul 24, 2017 at 7:02 PM, Guozhang Wang <
> wangg...@gmail.com>
> > >> wrote:
> > >> > >
> > >> > > > Actually Rebalancing includes two steps, and we name them
> > >> > > PrepareRebalance
> > >> > > > and WaitSync (arguably they may not be the best names). But
> these
> > >> two
> > >> > > steps
> > >> > > > together should be treated as the complete rebalance cycle.
> > >> > > >
> > >> > > >
> > >> > > > Guozhang
> > >> > > >
> > >> > > > On Mon, Jul 24, 2017 at 10:46 AM, Colin McCabe <
> > cmcc...@apache.org>
> > >> > > wrote:
> > >> > > >
> > >> > > > > Hi all,
> > >> > > > >
> > >> > > > > I think maybe it makes sense to rename the
> "PreparingRebalance"
> > >> > > consumer
> > >> > > > > group state to "Rebalancing."  To me, "Preparing" implies that
> > >> there
> > >> > > > > will be a later "rebalance" state that follows-- but there is
> > not.
> > >> > > > > Since we're now exposing this state name publicly in these
> > >> metrics,
> > >> > > > > perhaps it makes sense to do this rename now.  Thoughts?
> > >> > > > >
> > >> > > > > best,
> > >> > > > > Colin
> > >> > > > >
> > >> > > > >
> > >> > > > > On Fri, Jul 21, 2017, at 13:52, Colin McCabe wrote:
> > >> > > > > > That's a good point.  I revised the KIP to add metrics for
> all
> > >> the
> > >> > > > group
> > >> > > > > > states.
> > >> > > > > >
> > >> > > > > > best,
> > >> > > > > > Colin
> > >> > > > > >
> > >> > > > > >
> > >> > > > > > On Fri, Jul 21, 2017, at 12:08, Guozhang Wang wrote:
> > >> > > > > > > Ah, that's right Jason.
> > >> > > > > > >
> > >> > > > > > > With that I can be convinced to add one metric per each
> > state.
> > >> > > > > > >
> > >> > > > > > > Guozhang
> > >> > > > > > >
> > >> > > > > > > On Fri, Jul 21, 2017 at 11:44 AM, Jason Gustafson <
> > >> > > > ja...@confluent.io>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > "Dead" and "Empty" states are transient: groups
> usually
> > >> only
> > >> > > > > leaves in
> > >> > > > > > > > this
> > >> > > > > > > > > state for a short while and then being deleted or
> > >> transited to
> > >> > > > > other
> > >> > > > > > > > > states.
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > This is not strictly true for the "Empty" state which we
> > >> also use
> > >> > > > to
> > >> > > > > > > > represent simple groups which only use the coordinator
> to
> > >> store
> > >> > > > > offsets. I
> > >> > > > > > > > think we may as well cover all the states if we're going
> > to
> > >> cover
> > >> > > > > any of
> > >> > > > > > > > them specifically.
> > >> > > > > > > >
> > >> > > > > > > > -Jason
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > On Fri, Jul 21, 2017 at 9:45 AM, Guozhang Wang <
> > >> > > wangg...@gmail.com
> > >> > > > >
> > >> > > > > wrote:
> > >> > > > > > > >
> > >> > > > > > > > > My two cents:
> > >> > > > > > > > >
> > >> > > > > > > > > "Dead" and "Empty" states are transient: groups
> usually
> > >> only
> > >> > > > > leaves in
> > >> > > > > > > > this
> > >> > > > > > > > > state for a short while and then being deleted or
> > >> transited to
> > >> > > > > other
> > >> > > > > > > > > states.
> > >> > > > > > > > >
> > >> > > > > > > > > Since we have the existing "*NumGroups*" metric,
> > >> `*NumGroups -
> > >> > > > > > > > > **NumGroupsRebalancing
> > >> > > > > > > > > - **NumGroupsAwaitingSync`* should cover the above
> > three,
> > >> where
> > >> > > > > "Stable"
> > >> > > > > > > > > should be contributing most of the counts: If we have
> a
> > >> bug
> > >> > > that
> > >> > > > > causes
> > >> > > > > > > > the
> > >> > > > > > > > > num.Dead / Empty to keep increasing, then we would
> > observe
> > >> > > > > `NumGroups`
> > >> > > > > > > > keep
> > >> > > > > > > > > increasing which should be sufficient for alerting.
> And
> > >> trouble
> > >> > > > > shooting
> > >> > > > > > > > of
> > >> > > > > > > > > the issue could be relying on the log4j.
> > >> > > > > > > > >
> > >> > > > > > > > > *Guozhang*
> > >> > > > > > > > >
> > >> > > > > > > > > On Fri, Jul 21, 2017 at 7:19 AM, Ismael Juma <
> > >> > > ism...@juma.me.uk>
> > >> > > > > wrote:
> > >> > > > > > > > >
> > >> > > > > > > > > > Thanks for the KIP, Colin. This will definitely be
> > >> useful.
> > >> > > One
> > >> > > > > > > > question:
> > >> > > > > > > > > > would it be useful to have a metric for for the
> number
> > >> of
> > >> > > > groups
> > >> > > > > in
> > >> > > > > > > > each
> > >> > > > > > > > > > possible state? The KIP suggests
> "PreparingRebalance"
> > >> and
> > >> > > > > > > > "AwaitingSync".
> > >> > > > > > > > > > That leaves "Stable", "Dead" and "Empty". Are those
> > not
> > >> > > useful?
> > >> > > > > > > > > >
> > >> > > > > > > > > > Ismael
> > >> > > > > > > > > >
> > >> > > > > > > > > > On Thu, Jul 20, 2017 at 6:52 PM, Colin McCabe <
> > >> > > > > cmcc...@apache.org>
> > >> > > > > > > > > wrote:
> > >> > > > > > > > > >
> > >> > > > > > > > > > > Hi all,
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > I posted "KIP-180: Add a broker metric specifying
> > the
> > >> > > number
> > >> > > > of
> > >> > > > > > > > > consumer
> > >> > > > > > > > > > > group rebalances in progress" for discussion:
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > https://cwiki.apache.org/confl
> > >> uence/display/KAFKA/KIP-
> > >> > > > > > > > > > > 180%3A+Add+a+broker+metric+
> > specifying+the+number+of+
> > >> > > > > > > > > > > consumer+group+rebalances+in+progress
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > Check it out.
> > >> > > > > > > > > > >
> > >> > > > > > > > > > > cheers,
> > >> > > > > > > > > > > Colin
> > >> > > > > > > > > > >
> > >> > > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > >
> > >> > > > > > > > > --
> > >> > > > > > > > > -- Guozhang
> > >> > > > > > > > >
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > --
> > >> > > > > > > -- Guozhang
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > > -- Guozhang
> > >> > > >
> > >> > >
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > -- Guozhang
> > >>
> > >
> > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
>
> --
> -- Guozhang
>

Reply via email to