[jira] [Updated] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Chris Egerton (Jira) Mon, 11 Apr 2022 16:03:06 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Egerton updated KAFKA-13816:
----------------------------------
    Description: 
The rebalancing protocol for a Kafka Connect cluster can be downgraded from 
incremental to eager by adding a worker to the cluster with 
{{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker in 
that cluster, reconfiguring it with the new protocol, and restarting it.

When the worker (re)joins the cluster, a rebalance takes place using the eager 
protocol, and duplicate task instances are created on the cluster.

This occurs because:
 * The leader does not send out an assignment that revokes all connectors and 
tasks for the cluster during that round
 * Workers do not respond to the downgrade in protocol by revoking all 
connectors and tasks that they were running before the rebalance that are not 
included in the new assignment they received during the rebalance

It's likely that this bug hasn't surfaced sooner because any subsequent 
rebalance should cause all connectors and tasks on all each in the cluster to 
be proactively revoked before the worker rejoins the group.

[KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
 provides one way to address this:
{quote}To downgrade your cluster to use protocol version 0 from version 1 or 
higher with {{eager}} rebalancing policy what is required is to switch one of 
the workers back to {{eager}} mode. 
{panel}
 {panel}
|{{connect.protocol = eager}}|

Once this worker joins, the group will downgrade to protocol version 0 and 
{{eager}} rebalancing policy, with immediately release of resources upon 
joining the group. This process will require a one-time double rebalancing, 
with the leader detecting the downgrade and first sending a downgraded 
assignment with empty assigned connectors and tasks and from then on just 
regular downgraded assignments. 
{quote}
However, it's unclear how to accomplish the second round in the double 
rebalance described above.

  was:
The rebalancing protocol for a Kafka Connect cluster can be downgraded from 
incremental to eager by adding a worker to the cluster with 
{{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker in 
that cluster, reconfiguring it with the new protocol, and restarting it.

When the worker (re)joins the cluster, a rebalance takes place using the eager 
protocol, and duplicate task instances are created on the cluster.

This occurs because:
 * The leader does not send out an assignment that revokes all connectors and 
tasks for the cluster during that round
 * Workers do not respond to the downgrade in protocol by revoking all 
connectors and tasks that they were running before the rebalance that are not 
included in the new assignment they received during the rebalance

It's likely that this bug hasn't surfaced sooner because any subsequent 
rebalance should cause all connectors and tasks on all each in the cluster to 
be proactively revoked before the worker rejoins the group.

[KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
 provides one way to address this:
{quote}To downgrade your cluster to use protocol version 0 from version 1 or 
higher with {{eager}} rebalancing policy what is required is to switch one of 
the workers back to {{eager}} mode. 
{panel}
{panel}
|{{connect.protocol = eager}}|

Once this worker joins, the group will downgrade to protocol version 0 and 
{{eager}} rebalancing policy, with immediately release of resources upon 
joining the group. This process will require a one-time double rebalancing, 
with the leader detecting the downgrade and first sending a downgraded 
assignment with empty assigned connectors and tasks and from then on just 
regular downgraded assignments. 
{quote}


> Downgrading Connect rebalancing protocol from incremental to eager causes 
> duplicate task instances
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13816
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Chris Egerton
>            Priority: Major
>
> The rebalancing protocol for a Kafka Connect cluster can be downgraded from 
> incremental to eager by adding a worker to the cluster with 
> {{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker 
> in that cluster, reconfiguring it with the new protocol, and restarting it.
> When the worker (re)joins the cluster, a rebalance takes place using the 
> eager protocol, and duplicate task instances are created on the cluster.
> This occurs because:
>  * The leader does not send out an assignment that revokes all connectors and 
> tasks for the cluster during that round
>  * Workers do not respond to the downgrade in protocol by revoking all 
> connectors and tasks that they were running before the rebalance that are not 
> included in the new assignment they received during the rebalance
> It's likely that this bug hasn't surfaced sooner because any subsequent 
> rebalance should cause all connectors and tasks on all each in the cluster to 
> be proactively revoked before the worker rejoins the group.
> [KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
>  provides one way to address this:
> {quote}To downgrade your cluster to use protocol version 0 from version 1 or 
> higher with {{eager}} rebalancing policy what is required is to switch one of 
> the workers back to {{eager}} mode. 
> {panel}
>  {panel}
> |{{connect.protocol = eager}}|
> Once this worker joins, the group will downgrade to protocol version 0 and 
> {{eager}} rebalancing policy, with immediately release of resources upon 
> joining the group. This process will require a one-time double rebalancing, 
> with the leader detecting the downgrade and first sending a downgraded 
> assignment with empty assigned connectors and tasks and from then on just 
> regular downgraded assignments. 
> {quote}
> However, it's unclear how to accomplish the second round in the double 
> rebalance described above.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Updated] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Reply via email to