[jira] [Commented] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Chris Egerton (Jira) Mon, 11 Apr 2022 06:35:10 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520572#comment-17520572
 ]


Chris Egerton commented on KAFKA-13816:
---------------------------------------

I won't have time to work on this but I have put together a preliminary 
integration test that reproduces the issue: 
[https://github.com/C0urante/kafka/commit/a9f119dbf211d33193d6597c0c806b909ba219d2]

> Downgrading Connect rebalancing protocol from incremental to eager causes 
> duplicate task instances
> --------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-13816
>                 URL: https://issues.apache.org/jira/browse/KAFKA-13816
>             Project: Kafka
>          Issue Type: Bug
>          Components: KafkaConnect
>            Reporter: Chris Egerton
>            Priority: Major
>
> The rebalancing protocol for a Kafka Connect cluster can be downgraded from 
> incremental to eager by adding a worker to the cluster with 
> {{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker 
> in that cluster, reconfiguring it with the new protocol, and restarting it.
> When the worker (re)joins the cluster, a rebalance takes place using the 
> eager protocol, and duplicate task instances are created on the cluster.
> This occurs because:
>  * The leader does not send out an assignment that revokes all connectors and 
> tasks for the cluster during that round
>  * Workers do not respond to the downgrade in protocol by revoking all 
> connectors and tasks that they were running before the rebalance that are not 
> included in the new assignment they received during the rebalance
> It's likely that this bug hasn't surfaced sooner because any subsequent 
> rebalance should cause all connectors and tasks on all each in the cluster to 
> be proactively revoked before the worker rejoins the group.
> [KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
>  provides one way to address this:
> {quote}To downgrade your cluster to use protocol version 0 from version 1 or 
> higher with {{eager}} rebalancing policy what is required is to switch one of 
> the workers back to {{eager}} mode. 
> {panel}
> {panel}
> |{{connect.protocol = eager}}|
> Once this worker joins, the group will downgrade to protocol version 0 and 
> {{eager}} rebalancing policy, with immediately release of resources upon 
> joining the group. This process will require a one-time double rebalancing, 
> with the leader detecting the downgrade and first sending a downgraded 
> assignment with empty assigned connectors and tasks and from then on just 
> regular downgraded assignments. 
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Commented] (KAFKA-13816) Downgrading Connect rebalancing protocol from incremental to eager causes duplicate task instances

Reply via email to