[ https://issues.apache.org/jira/browse/KAFKA-10678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17231084#comment-17231084 ]
A. Sophie Blee-Goldman commented on KAFKA-10678: ------------------------------------------------ Yeah, it seems like anytime a member is restarted and the randomly generated UUID places it in a different order relative to all the other clients, you can get this task migration. I filed KAFKA-10716 so we can look into this right away rather than wait on KAFKA-10121 I can't think of a true workaround for the meantime, but you could set the "max.warmup.replicas" config to 1 to slow down the movement of tasks (at the cost of some speed when scaling out, etc). It's also possible to revert to using the old assignor with an internal backdoor for emergencies. Obviously that means sacrificing the new HA guarantees, but it may work well enough for example if you have frequent restarts but the group membership is generally stable (and state isn't lost, etc) > Re-deploying Streams app causes rebalance and task migration > ------------------------------------------------------------ > > Key: KAFKA-10678 > URL: https://issues.apache.org/jira/browse/KAFKA-10678 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 2.6.0, 2.6.1 > Reporter: Bradley Peterson > Priority: Major > Attachments: after, before, broker > > > Re-deploying our Streams app causes a rebalance, even when using static group > membership. Worse, the rebalance creates standby tasks, even when the > previous task assignment was balanced and stable. > Our app is currently using Streams 2.6.1-SNAPSHOT (due to [KAFKA-10633]) but > we saw the same behavior in 2.6.0. The app runs on 4 EC2 instances, each with > 4 streams threads, and data stored on persistent EBS volumes.. During a > redeploy, all EC2 instances are stopped, new instances are launched, and the > EBS volumes are attached to the new instances. We do not use interactive > queries. {{session.timeout.ms}} is set to 30 minutes, and the deployment > finishes well under that. {{num.standby.replicas}} is 0. > h2. Expected Behavior > Given a stable and balanced task assignment prior to deploying, we expect to > see the same task assignment after deploying. Even if a rebalance is > triggered, we do not expect to see new standby tasks. > h2. Observed Behavior > Attached are the "Assigned tasks to clients" log lines from before and after > deploying. The "before" is from over 24 hours ago, the task assignment is > well balanced and "Finished stable assignment of tasks, no followup > rebalances required." is logged. The "after" log lines show the same > assignment of active tasks, but some additional standby tasks. There are > additional log lines about adding and removing active tasks, which I don't > quite understand. > I've also included logs from the broker showing the rebalance was triggered > for "Updating metadata". -- This message was sent by Atlassian Jira (v8.3.4#803005)