Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams

Guozhang Wang Wed, 08 May 2019 18:37:09 -0700

Hello Boyang,

On Wed, May 1, 2019 at 4:51 PM Boyang Chen <bche...@outlook.com> wrote:


> Hey Guozhang,
>
> thank you for the great write up. Overall the motivation and changes LGTM,
> just some minor comments:
>
>
>   1.  In "Consumer Coordinator Algorithm", we could reorder alphabet
> points for 3d~3f from ["ready-to-migrate-partitions",
> "unknown-but-owned-partitions",  "maybe-revoking-partitions"] to
> ["maybe-revoking-partitions", "ready-to-migrate-partitions",
> "unknown-but-owned-partitions"] in order to be consistent with 3c1~3.
>

Ack. Updated.


>   2.  In "Consumer Coordinator Algorithm", 1c suggests to revoke all
> partition upon heartbeat/commit fail. What's the gain here? Do we want to
> keep all partitions running at this moment, to be optimistic for the case
> when no partitions get reassigned?
>

That's a good catch. When REBALANCE_IN_PROGRESS is received, we can just
re-join the group with all the currently owned partitions encoded. Updated.


>   3.  In "Recommended Upgrade Procedure", remove extra 'those': " The
> 'sticky' assignor works even those there are "
>

Ack, should be `even when`.


>   4.  Put two "looking into the future" into a separate category from
> migration session. It seems inconsistent for readers to see this before we
> finished discussion for everything.
>

Ack.


>   5.  Have we discussed the concern on the serialization? Could the new
> metadata we are adding grow larger than the message size cap?
>

We're completing https://issues.apache.org/jira/browse/KAFKA-7149 which
should largely reduce the message size (will update the wiki accordingly as
well).


>
> Boyang
>
> ________________________________
> From: Guozhang Wang <wangg...@gmail.com>
> Sent: Monday, April 15, 2019 9:20 AM
> To: dev
> Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams
>
> Hello Jason,
>
> I agree with you that for range / round-robin it makes less sense to be
> compatible with cooperative rebalance protocol.
>
> As for StickyAssignor, however, I think it would still be possible to make
> the current implementation to be compatible with cooperative rebalance. So
> after pondering on different options at hand I'm now proposing this
> approach as listed in the upgrade section:
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Kafka+Consumer+Incremental+Rebalance+Protocol#KIP-429:KafkaConsumerIncrementalRebalanceProtocol-CompatibilityandUpgradePath
>
> The idea is to let assignors specify which protocols it would work with,
> associating with a different name; then the upgrade path would involve a
> "compatible" protocol which actually still use eager behavior while
> encoding two assignors if possible. In "Rejected Section" (just to clarify
> I'm not finalizing it as rejected, just putting it there for now, and if we
> like this one instead we can always switch them) I listed the other
> approach we once discussed about, and arguing its cons of duplicated class
> seems overwhelm the pros of saving the  "rebalance.protocol" config.
>
> Let me know WDYT.
>
> Guozhang
>
> On Fri, Apr 12, 2019 at 6:08 PM Jason Gustafson <ja...@confluent.io>
> wrote:
>
> > Hi Guozhang,
> >
> > Responses below:
> >
> > 2. The interface's default implementation will just be
> > > `onPartitionRevoked`, so for user's instantiation if they do not make
> any
> > > code changes they should be able to recompile the code and continue.
> >
> >
> > Ack, makes sense.
> >
> > 4. Hmm.. not sure if it will work. The main issue is that the
> > > consumer-coordinator behavior (whether to revoke all or none at
> > > onRebalancePrepare) is independent of the selected protocol's assignor
> > > (eager or cooperative), so even if the assignor is selected to be the
> > > old-versioned one, we will still not revoke at the consumer-coordinator
> > > layer and hence has the same risk of migrating still-owned partitions,
> > > right?
> >
> >
> > Yeah, basically we would have to push the eager/cooperative logic into
> the
> > PartitionAssignor itself and make the consumer aware of the rebalance
> > protocol it is compatible with. As long as an eager protocol _could_ be
> > selected, the consumer would have to be pessimistic and do eager
> > revocation. But if all the assignors configured in the consumer support
> > cooperative reassignment, then either 1) a cooperative protocol will be
> > selected and cooperative revocation can be safely used, or 2) if the rest
> > of the group does not support it, then the consumer will simply fail.
> >
> > Another point which you raised offline and I will repeat here is that
> this
> > proposal's benefit is mostly limited to sticky assignment logic. Arguably
> > the range assignor may have some incidental stickiness, particularly if
> the
> > group is rebalancing for a newly created or deleted topic. For other
> cases,
> > the proposal is mostly additional overhead since it takes an additional
> > rebalance and many of the partitions will move. Perhaps it doesn't make
> as
> > much sense to use the cooperative protocol for strategies like range and
> > round-robin. That kind of argues in favor of pushing some of the control
> > into the assignor itself. Maybe we would not bother creating
> > CooperativeRange as I suggested above, but it would make sense to create
> a
> > cooperative version of the sticky assignment strategy. I thought we might
> > have to create a new sticky assignor anyway because I can't see how we
> > would get compatible behavior mixing with the old version anyway.
> >
> > Thanks,
> > Jason
> >
> >
> > On Thu, Apr 11, 2019 at 5:53 PM Guozhang Wang <wangg...@gmail.com>
> wrote:
> >
> > > Hello Matthias:
> > >
> > > Thanks for your review.
> > >
> > > The background section uses streams assignor as well as the consumer's
> > own
> > > stick assignor as examples illustrating the situation, but this KIP is
> > for
> > > consumer coordinator itself, and the rest of the paragraph did not talk
> > > about Streams any more. If you feel it's a bit distracted I can remove
> > > those examples.
> > >
> > > 10). While working on the PR I realized that the revoked partitions on
> > > assignment is not needed (this is being discussed on the PR itself:
> > > https://github.com/apache/kafka/pull/6528#issuecomment-480009890
> > >
> > > 20). 1.a. Good question, I've updated the wiki to let the consumer's
> > > cleanup assignment and re-join, and not letting assignor making any
> > > proactive changes. The idea is to keep logic simpler and not doing any
> > > "split brain" stuff.
> > >
> > > 20). 2.b. No we do not need, since the owned-partitions will be part of
> > the
> > > Subscription passed in to assign() already.
> > >
> > > 30). As Boyang mentioned, there are some drawbacks that can not be
> > > addressed by rebalance delay still, hence still voted KIP-345 (some
> more
> > > details can be found on the discussion thread of KIP-345 itself). One
> > > example is that as the instance resumes, its member id will be empty so
> > we
> > > are still relying on assignor to give it the assignment from the old
> > > member-id while keeping all other member's assignment unchanged.
> > >
> > > 40). Incomplete sentence, I've updated it.
> > >
> > > 50). Here's my idea: suppose we augment the join group schema with
> > > `protocol version` in 2.3, and then with both brokers and clients being
> > in
> > > version 2.3+, on the first rolling bounce where subscription and
> > assignment
> > > schema and / or user metadata has changed, this protocol version will
> be
> > > bumped. On the broker side, when receiving all member's join-group
> > request,
> > > it will choose the one that has the highest protocol version (also it
> > > assumes higher versioned protocol is always backward compatible, i.e.
> the
> > > coordinator can recognize lower versioned protocol as well) and select
> it
> > > as the leader. Then the leader can decide, based on its received and
> > > deserialized subscription information, how to assign partitions and how
> > to
> > > encode the assignment accordingly so that everyone can understand it.
> > With
> > > this, in Streams for example, no version probing would be needed since
> we
> > > are guaranteed the leader knows everyone's version -- again it is
> > assuming
> > > that higher versioned protocol is always backward compatible -- and
> hence
> > > can successfully do the assignment at that round.
> > >
> > > 60). My bad, this section was not updated while the design was evolved,
> > > I've updated it.
> > >
> > >
> > > On Tue, Apr 9, 2019 at 7:22 PM Boyang Chen <bche...@outlook.com>
> wrote:
> > >
> > > >
> > > > Thanks for the review Matthias! My 2-cent on the rebalance delay is
> > that
> > > > it is a rather fixed trade-off between
> > > >
> > > > task availability and resource shuffling. If we eventually trigger
> > > > rebalance after rolling bounce, certain consumer
> > > >
> > > > setup is still faced with global shuffles, for example member.id
> > ranking
> > > > based round robin strategy, as rejoining dynamic
> > > >
> > > > members will be assigned with new member.id which reorders the
> > > > assignment. So I think the primary goal of incremental
> > > >
> > > > rebalancing is still improving the cluster availability during
> > rebalance,
> > > > because it didn't revoke any partition during this
> > > >
> > > > process. Also, the perk is minimum configuration requirement :)
> > > >
> > > >
> > > > Best,
> > > >
> > > > Boyang
> > > >
> > > > ________________________________
> > > > From: Matthias J. Sax <matth...@confluent.io>
> > > > Sent: Tuesday, April 9, 2019 7:47 AM
> > > > To: dev
> > > > Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka
> Streams
> > > >
> > > > Thank for the KIP, Boyang and Guozhang!
> > > >
> > > >
> > > > I made an initial pass and have some questions/comments. One high
> level
> > > > comment: it seems that the KIP "mixes" plain consumer and Kafka
> Streams
> > > > use case a little bit (at least in the presentation). It might be
> > > > helpful to separate both cases clearly, or maybe limit the scope to
> > > > plain consumer only.
> > > >
> > > >
> > > >
> > > > 10) For `PartitionAssignor.Assignment`: It seems we need a new method
> > > > `List<TopicPartitions> revokedPartitions()` ?
> > > >
> > > >
> > > >
> > > > 20) In Section "Consumer Coordinator Algorithm"
> > > >
> > > >     Bullet point "1a)": If the subscription changes and a topic is
> > > > removed from the subscription, why do we not revoke the partitions?
> > > >
> > > >     Bullet point "1a)": What happens is a topic is deleted (or a
> > > > partition is removed/deleted from a topic)? Should we call the new
> > > > `onPartitionsEmigrated()` callback for this case?
> > > >
> > > >     Bullet point "2b)" Should we update the `PartitionAssignor`
> > > > interface to pass in the "old assignment" as third parameter into
> > > > `assign()`?
> > > >
> > > >
> > > >
> > > > 30) Rebalance delay (as used in KIP-415): Could a rebalance delay
> > > > subsume KIP-345? Configuring static members is rather complicated,
> and
> > I
> > > > am wondering if a rebalance delay would be sufficient?
> > > >
> > > >
> > > >
> > > > 40) Quote: "otherwise the we would fall into the case 3.b) forever."
> > > >
> > > > What is "case 3.b" ?
> > > >
> > > >
> > > >
> > > > 50) Section "Looking into the Future"
> > > >
> > > > Nit: the new "ProtocolVersion" field is missing in the first line
> > > > describing "JoinGroupRequest"
> > > >
> > > > > This can also help saving "version probing" cost on Streams as
> well.
> > > >
> > > > How does this relate to Kafka Streams "version probing"
> implementation?
> > > > How can we exploit the new `ProtocolVersion` in Streams to improve
> > > > "version probing" ? I have a rough idea, but would like to hear more
> > > > details.
> > > >
> > > >
> > > >
> > > > 60) Section "Recommended Upgrade Procedure"
> > > >
> > > > > Set the `stream.rebalancing.mode` to `upgrading`, which will force
> > the
> > > > stream application to stay with protocol type "consumer".
> > > >
> > > > This config is not discussed in the KIP and appears in this section
> > > > without context. Can you elaborate about it?
> > > >
> > > >
> > > >
> > > > -Matthias
> > > >
> > > >
> > > >
> > > >
> > > > On 3/29/19 6:20 PM, Guozhang Wang wrote:
> > > > > Bump up on this discussion thread. I've added a few new drawings
> for
> > > > better
> > > > > illustration, would really appreciate your feedbacks.
> > > > >
> > > > >
> > > > > Guozhang
> > > > >
> > > > > On Wed, Mar 20, 2019 at 6:17 PM Guozhang Wang <wangg...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Hello Boyang,
> > > > >>
> > > > >> I've made another thorough pass over this KIP and I'd like to
> spilt
> > it
> > > > >> into two parts: the first part, covered in KIP-429 would be
> touching
> > > on
> > > > >> Consumer Coordinator only to have incremental rebalance protocol
> in
> > > > place.
> > > > >> The second part (for now I've reserved KIP number 444 for it)
> would
> > > > contain
> > > > >> all the changes on StreamsPartitionAssginor to allow warming up
> new
> > > > >> members.
> > > > >>
> > > > >> I think the first part, a.k.a. the current updated KIP-429 is
> ready
> > > for
> > > > >> review and discussions again. Would love to hear people's
> feedbacks
> > > and
> > > > >> ideas.
> > > > >>
> > > > >> Guozhang
> > > > >>
> > > > >>
> > > > >>
> > > > >> On Mon, Mar 4, 2019 at 10:09 AM Boyang Chen <bche...@outlook.com>
> > > > wrote:
> > > > >>
> > > > >>> Thanks Guozhang for the great questions. Answers are inlined:
> > > > >>>
> > > > >>> 1. I'm still not sure if it's worthwhile to add a new type of
> > > "learner
> > > > >>> task" in addition to "standby task": if the only difference is
> that
> > > for
> > > > >>> the
> > > > >>> latter, we would consider workload balance while for the former
> we
> > > > would
> > > > >>> not, I think we can just adjust the logic of StickyTaskAssignor a
> > bit
> > > > to
> > > > >>> break that difference. Adding a new type of task would be adding
> a
> > > lot
> > > > of
> > > > >>> code complexity, so if we can still piggy-back the logic on a
> > > > standby-task
> > > > >>> I would prefer to do so.
> > > > >>> In the proposal we stated that we are not adding a new type of
> task
> > > > >>> implementation. The
> > > > >>> learner task shall share the same implementation with normal
> > standby
> > > > >>> task, only that we
> > > > >>> shall tag the standby task with learner and prioritize the
> learner
> > > > tasks
> > > > >>> replay effort.
> > > > >>> 2. One thing that's still not clear from the KIP wiki itself is
> > which
> > > > >>> layer
> > > > >>> would the logic be implemented at. Although for most KIPs we
> would
> > > not
> > > > >>> require internal implementation details but only public facing
> API
> > > > >>> updates,
> > > > >>> for a KIP like this I think it still requires to flesh out
> details
> > on
> > > > the
> > > > >>> implementation design. More specifically: today Streams embed a
> > full
> > > > >>> fledged Consumer client, which hard-code a ConsumerCoordinator
> > > inside,
> > > > >>> Streams then injects a StreamsPartitionAssignor to its pluggable
> > > > >>> PartitionAssignor interface and inside the
> StreamsPartitionAssignor
> > > we
> > > > >>> also
> > > > >>> have a TaskAssignor interface whose default implementation is
> > > > >>> StickyPartitionAssignor. Streams partition assignor logic today
> > sites
> > > > in
> > > > >>> the latter two classes. Hence the hierarchy today is:
> > > > >>>
> > > > >>> KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor
> ->
> > > > >>> StickyTaskAssignor.
> > > > >>>
> > > > >>> We need to think about where the proposed implementation would
> take
> > > > place
> > > > >>> at, and personally I think it is not the best option to inject
> all
> > of
> > > > them
> > > > >>> into the StreamsPartitionAssignor / StickyTaskAssignor since the
> > > logic
> > > > of
> > > > >>> "triggering another rebalance" etc would require some coordinator
> > > logic
> > > > >>> which is hard to mimic at PartitionAssignor level. On the other
> > hand,
> > > > >>> since
> > > > >>> we are embedding a KafkaConsumer client as a whole we cannot just
> > > > replace
> > > > >>> ConsumerCoordinator with a specialized StreamsCoordinator like
> > > Connect
> > > > >>> does
> > > > >>> in KIP-415. So I'd like to maybe split the current proposal in
> both
> > > > >>> consumer layer and streams-assignor layer like we did in
> > > > KIP-98/KIP-129.
> > > > >>> And then the key thing to consider is how to cut off the boundary
> > so
> > > > that
> > > > >>> the modifications we push to ConsumerCoordinator would be
> > beneficial
> > > > >>> universally for any consumers, while keep the Streams-specific
> > logic
> > > at
> > > > >>> the
> > > > >>> assignor level.
> > > > >>> Yes, that's also my ideal plan. The details for the
> implementation
> > > are
> > > > >>> depicted
> > > > >>> in this doc<
> > > > >>>
> > > >
> > >
> >
> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
> > > > >,
> > > > [
> > > >
> > >
> >
> https://lh5.googleusercontent.com/DXWMyKNE9rFFIv7TNX56Q41QwqYp8ynivwWSJHHORqSRkoQxtraW2bqiB-NRUGAMYKkt8A=w1200-h630-p
> > > > ]<
> > > >
> > >
> >
> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
> > > > >
> > > >
> > > > [External] KStream Smooth Auto-scaling Implementation Plan<
> > > >
> > >
> >
> https://docs.google.com/document/d/1me2a5wvxAZT1QE6HkwyDl7C2TiBQlKN3Dpw_I1ro91U/edit#heading=h.qix74qdmekae
> > > > >
> > > > docs.google.com
> > > > KStream Incremental Rebalancing Implementation Plan Authors: Boyang
> > Chen,
> > > > Guozhang Wang KIP link Stage: [Draft | Review | Approved] Background
> We
> > > > initiated KIP-429 for the promotion of incremental rebalancing work
> for
> > > > KStream. Behind the scene, there is non-trivial amount of effort that
> > > needs
> > > > to...
> > > >
> > > >
> > > >
> > > > >>> and I have explained the reasoning on why we want to push a
> > > > >>> global change of replacing ConsumerCoordinator with
> > > StreamCoordinator.
> > > > >>> The motivation
> > > > >>> is that KIP space is usually used for public & algorithm level
> > > change,
> > > > >>> not for internal
> > > > >>> implementation details.
> > > > >>>
> > > > >>> 3. Depending on which design direction we choose, our migration
> > plan
> > > > would
> > > > >>> also be quite different. For example, if we stay with
> > > > ConsumerCoordinator
> > > > >>> whose protocol type is "consumer" still, and we can manage to
> make
> > > all
> > > > >>> changes agnostic to brokers as well as to old versioned
> consumers,
> > > then
> > > > >>> our
> > > > >>> migration plan could be much easier.
> > > > >>> Yes, the upgrade plan was designed to take the new
> > StreamCoordinator
> > > > >>> approach
> > > > >>> which means we shall define a new protocol type. For existing
> > > > application
> > > > >>> we could only
> > > > >>> maintain the same `consumer` protocol type is because current
> > broker
> > > > only
> > > > >>> allows
> > > > >>> change of protocol type when the consumer group is empty. It is
> of
> > > > course
> > > > >>> user-unfriendly to force
> > > > >>> a wipe-out for the entire application, and I don't think
> > maintaining
> > > > old
> > > > >>> protocol type would greatly
> > > > >>> impact ongoing services using new stream coordinator. WDYT?
> > > > >>>
> > > > >>> 4. I think one major issue related to this KIP is that today, in
> > the
> > > > >>> StickyPartitionAssignor, we always try to honor stickiness over
> > > > workload
> > > > >>> balance, and hence "learner task" is needed to break this
> priority,
> > > but
> > > > >>> I'm
> > > > >>> wondering if we can have a better solution within sticky task
> > > assignor
> > > > >>> that
> > > > >>> accommodate this?
> > > > >>> Great question! That's what I explained in the proposal, which is
> > > that
> > > > we
> > > > >>> should breakdown our
> > > > >>> delivery into different stages. At very beginning, our goal is to
> > > > trigger
> > > > >>> learner task assignment only on
> > > > >>> `new` hosts, where we shall leverage leader's knowledge of
> previous
> > > > round
> > > > >>> of rebalance to figure out. After
> > > > >>> stage one, our goal is to have a smooth scaling up experience,
> but
> > > the
> > > > >>> task balance problem is kind of orthogonal.
> > > > >>> The load balance problem is a much broader topic than auto
> scaling,
> > > > which
> > > > >>> I figure worth discussing within
> > > > >>> this KIP's context since it's a naturally next-step, but wouldn't
> > be
> > > > the
> > > > >>> main topic.
> > > > >>> Learner task or auto scaling support should be treated as `a
> > helpful
> > > > >>> mechanism to reach load balance`, but not `an algorithm defining
> > load
> > > > >>> balance`. It would be great if you could share some insights of
> the
> > > > stream
> > > > >>> task balance, which eventually helps us to break out of the
> > KIP-429's
> > > > scope
> > > > >>> and even define a separate KIP to focus on task weight &
> assignment
> > > > logic
> > > > >>> improvement.
> > > > >>>
> > > > >>> Also thank you for making improvement on the KIP context and
> > > > organization!
> > > > >>>
> > > > >>> Best,
> > > > >>> Boyang
> > > > >>> ________________________________
> > > > >>> From: Guozhang Wang <wangg...@gmail.com>
> > > > >>> Sent: Saturday, March 2, 2019 6:00 AM
> > > > >>> To: dev
> > > > >>> Subject: Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka
> > > Streams
> > > > >>>
> > > > >>> Hello Boyang,
> > > > >>>
> > > > >>> I've just made a quick pass on the KIP and here are some
> thoughts.
> > > > >>>
> > > > >>> Meta:
> > > > >>>
> > > > >>> 1. I'm still not sure if it's worthwhile to add a new type of
> > > "learner
> > > > >>> task" in addition to "standby task": if the only difference is
> that
> > > for
> > > > >>> the
> > > > >>> latter, we would consider workload balance while for the former
> we
> > > > would
> > > > >>> not, I think we can just adjust the logic of StickyTaskAssignor a
> > bit
> > > > to
> > > > >>> break that difference. Adding a new type of task would be adding
> a
> > > lot
> > > > of
> > > > >>> code complexity, so if we can still piggy-back the logic on a
> > > > standby-task
> > > > >>> I would prefer to do so.
> > > > >>>
> > > > >>> 2. One thing that's still not clear from the KIP wiki itself is
> > which
> > > > >>> layer
> > > > >>> would the logic be implemented at. Although for most KIPs we
> would
> > > not
> > > > >>> require internal implementation details but only public facing
> API
> > > > >>> updates,
> > > > >>> for a KIP like this I think it still requires to flesh out
> details
> > on
> > > > the
> > > > >>> implementation design. More specifically: today Streams embed a
> > full
> > > > >>> fledged Consumer client, which hard-code a ConsumerCoordinator
> > > inside,
> > > > >>> Streams then injects a StreamsPartitionAssignor to its plugable
> > > > >>> PartitionAssignor interface and inside the
> StreamsPartitionAssignor
> > > we
> > > > >>> also
> > > > >>> have a TaskAssignor interface whose default implementation is
> > > > >>> StickyPartitionAssignor. Streams partition assignor logic today
> > sites
> > > > in
> > > > >>> the latter two classes. Hence the hierarchy today is:
> > > > >>>
> > > > >>> KafkaConsumer -> ConsumerCoordinator -> StreamsPartitionAssignor
> ->
> > > > >>> StickyTaskAssignor.
> > > > >>>
> > > > >>> We need to think about where the proposed implementation would
> take
> > > > place
> > > > >>> at, and personally I think it is not the best option to inject
> all
> > of
> > > > them
> > > > >>> into the StreamsPartitionAssignor / StickyTaskAssignor since the
> > > logic
> > > > of
> > > > >>> "triggering another rebalance" etc would require some coordinator
> > > logic
> > > > >>> which is hard to mimic at PartitionAssignor level. On the other
> > hand,
> > > > >>> since
> > > > >>> we are embedding a KafkaConsumer client as a whole we cannot just
> > > > replace
> > > > >>> ConsumerCoordinator with a specialized StreamsCoordinator like
> > > Connect
> > > > >>> does
> > > > >>> in KIP-415. So I'd like to maybe split the current proposal in
> both
> > > > >>> consumer layer and streams-assignor layer like we did in
> > > > KIP-98/KIP-129.
> > > > >>> And then the key thing to consider is how to cut off the boundary
> > so
> > > > that
> > > > >>> the modifications we push to ConsumerCoordinator would be
> > beneficial
> > > > >>> universally for any consumers, while keep the Streams-specific
> > logic
> > > at
> > > > >>> the
> > > > >>> assignor level.
> > > > >>>
> > > > >>> 3. Depending on which design direction we choose, our migration
> > plan
> > > > would
> > > > >>> also be quite different. For example, if we stay with
> > > > ConsumerCoordinator
> > > > >>> whose protocol type is "consumer" still, and we can manage to
> make
> > > all
> > > > >>> changes agnostic to brokers as well as to old versioned
> consumers,
> > > then
> > > > >>> our
> > > > >>> migration plan could be much easier.
> > > > >>>
> > > > >>> 4. I think one major issue related to this KIP is that today, in
> > the
> > > > >>> StickyPartitionAssignor, we always try to honor stickiness over
> > > > workload
> > > > >>> balance, and hence "learner task" is needed to break this
> priority,
> > > but
> > > > >>> I'm
> > > > >>> wondering if we can have a better solution within sticky task
> > > assignor
> > > > >>> that
> > > > >>> accommodate this?
> > > > >>>
> > > > >>> Minor:
> > > > >>>
> > > > >>> 1. The idea of two rebalances have also been discussed in
> > > > >>> https://issues.apache.org/jira/browse/KAFKA-6145. So we should
> add
> > > the
> > > > >>> reference on the wiki page as well.
> > > > >>> 2. Could you also add a section describing how the subscription /
> > > > >>> assignment metadata will be re-formatted? Without this
> information
> > it
> > > > is
> > > > >>> hard to get to the bottom of your idea. For example in the
> "Leader
> > > > >>> Transfer
> > > > >>> Before Scaling" section, I'm not sure why "S2 doesn't know S4 is
> > new
> > > > >>> member"
> > > > >>> and hence would blindly obey stickiness over workload balance
> > > > requirement.
> > > > >>>
> > > > >>> Guozhang
> > > > >>>
> > > > >>>
> > > > >>> On Thu, Feb 28, 2019 at 11:05 AM Boyang Chen <
> bche...@outlook.com>
> > > > wrote:
> > > > >>>
> > > > >>>> Hey community friends,
> > > > >>>>
> > > > >>>> I'm gladly inviting you to have a look at the proposal to add
> > > > >>> incremental
> > > > >>>> rebalancing to Kafka Streams, A.K.A auto-scaling support.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-429%3A+Smooth+Auto-Scaling+for+Kafka+Streams
> > > > >>>>
> > > > >>>> Special thanks to Guozhang for giving great guidances and
> > important
> > > > >>>> feedbacks while making this KIP!
> > > > >>>>
> > > > >>>> Best,
> > > > >>>> Boyang
> > > > >>>>
> > > > >>>
> > > > >>>
> > > > >>> --
> > > > >>> -- Guozhang
> > > > >>>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> -- Guozhang
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > >
> > > --
> > > -- Guozhang
> > >
> >
>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-429 : Smooth Auto-Scaling for Kafka Streams

Reply via email to