[jira] [Commented] (KAFKA-10678) Re-deploying Streams app causes rebalance and task migration

A. Sophie Blee-Goldman (Jira) Thu, 12 Nov 2020 17:55:52 -0800


    [ 
https://issues.apache.org/jira/browse/KAFKA-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231084#comment-17231084
 ]


A. Sophie Blee-Goldman commented on KAFKA-10678:
------------------------------------------------

Yeah, it seems like anytime a member is restarted and the randomly generated 
UUID places it in a different order relative to all the other clients, you can 
get this task migration. I filed KAFKA-10716 so we can look into this right 
away rather than wait on KAFKA-10121

I can't think of a true workaround for the meantime, but you could set the 
"max.warmup.replicas" config to 1 to slow down the movement of tasks (at the 
cost of some speed when scaling out, etc). It's also possible to revert to 
using the old assignor with an internal backdoor for emergencies. Obviously 
that means sacrificing the new HA guarantees, but it may work well enough for 
example if you have frequent restarts but the group membership is generally 
stable (and state isn't lost, etc)

> Re-deploying Streams app causes rebalance and task migration
> ------------------------------------------------------------
>
>                 Key: KAFKA-10678
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10678
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0, 2.6.1
>            Reporter: Bradley Peterson
>            Priority: Major
>         Attachments: after, before, broker
>
>
> Re-deploying our Streams app causes a rebalance, even when using static group 
> membership. Worse, the rebalance creates standby tasks, even when the 
> previous task assignment was balanced and stable.
> Our app is currently using Streams 2.6.1-SNAPSHOT (due to [KAFKA-10633]) but 
> we saw the same behavior in 2.6.0. The app runs on 4 EC2 instances, each with 
> 4 streams threads, and data stored on persistent EBS volumes.. During a 
> redeploy, all EC2 instances are stopped, new instances are launched, and the 
> EBS volumes are attached to the new instances. We do not use interactive 
> queries. {{session.timeout.ms}} is set to 30 minutes, and the deployment 
> finishes well under that. {{num.standby.replicas}} is 0.
> h2. Expected Behavior
> Given a stable and balanced task assignment prior to deploying, we expect to 
> see the same task assignment after deploying. Even if a rebalance is 
> triggered, we do not expect to see new standby tasks.
> h2. Observed Behavior
> Attached are the "Assigned tasks to clients" log lines from before and after 
> deploying. The "before" is from over 24 hours ago, the task assignment is 
> well balanced and "Finished stable assignment of tasks, no followup 
> rebalances required." is logged. The "after" log lines show the same 
> assignment of active tasks, but some additional standby tasks. There are 
> additional log lines about adding and removing active tasks, which I don't 
> quite understand.
> I've also included logs from the broker showing the rebalance was triggered 
> for "Updating metadata".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (KAFKA-10678) Re-deploying Streams app causes rebalance and task migration

Reply via email to