[ 
https://issues.apache.org/jira/browse/KAFKA-10633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17219861#comment-17219861
 ] 

A. Sophie Blee-Goldman commented on KAFKA-10633:
------------------------------------------------

A thread should always reset its scheduled rebalance after triggering one, and 
it will only set the rebalance schedule when it receives its assignment after a 
rebalance. And the probing rebalance is always scheduled for 10min past the 
current time, so the fact that you see the same time printed in the "Triggering 
the follow rebalance" and "Requested to schedule probing rebalance" messages 
indicates that the member did not actually go through a rebalance, it just 
received its previous assignment directly from the broker.

Also, a rebalance will never be completed in under a second, so seeing 
`Triggering the followup rebalance scheduled for 1603323868771 ms` followed 
immediately by `Requested to schedule probing rebalance for 1603323868771 ms`   
seems to verify that it did not in fact go through a rebalance.

[~thebearmayor] this issue is fixed in 2.7.0 and 2.6.1 – not sure when 2.6.1 
will be released but the 2.7.0 release is currently in progress so it should 
hopefully be available soon. Apologies for our oversight in designing KIP-441

> Constant probing rebalances in Streams 2.6
> ------------------------------------------
>
>                 Key: KAFKA-10633
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10633
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.6.0
>            Reporter: Bradley Peterson
>            Priority: Major
>         Attachments: Discover 2020-10-21T23 34 03.867Z - 2020-10-21T23 44 
> 46.409Z.csv
>
>
> We are seeing a few issues with the new rebalancing behavior in Streams 2.6. 
> This ticket is for constant probing rebalances on one StreamThread, but I'll 
> mention the other issues, as they may be related.
> First, when we redeploy the application we see tasks being moved, even though 
> the task assignment was stable before redeploying. We would expect to see 
> tasks assigned back to the same instances and no movement. The application is 
> in EC2, with persistent EBS volumes, and we use static group membership to 
> avoid rebalancing. To redeploy the app we terminate all EC2 instances. The 
> new instances will reattach the EBS volumes and use the same group member id.
> After redeploying, we sometimes see the group leader go into a tight probing 
> rebalance loop. This doesn't happen immediately, it could be several hours 
> later. Because the redeploy caused task movement, we see expected probing 
> rebalances every 10 minutes. But, then one thread will go into a tight loop 
> logging messages like "Triggering the followup rebalance scheduled for 
> 1603323868771 ms.", handling the partition assignment (which doesn't change), 
> then "Requested to schedule probing rebalance for 1603323868771 ms." This 
> repeats several times a second until the app is restarted again. I'll attach 
> a log export from one such incident.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to