[
https://issues.apache.org/jira/browse/KAFKA-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17520572#comment-17520572
]
Chris Egerton commented on KAFKA-13816:
---------------------------------------
I won't have time to work on this but I have put together a preliminary
integration test that reproduces the issue:
[https://github.com/C0urante/kafka/commit/a9f119dbf211d33193d6597c0c806b909ba219d2]
> Downgrading Connect rebalancing protocol from incremental to eager causes
> duplicate task instances
> --------------------------------------------------------------------------------------------------
>
> Key: KAFKA-13816
> URL: https://issues.apache.org/jira/browse/KAFKA-13816
> Project: Kafka
> Issue Type: Bug
> Components: KafkaConnect
> Reporter: Chris Egerton
> Priority: Major
>
> The rebalancing protocol for a Kafka Connect cluster can be downgraded from
> incremental to eager by adding a worker to the cluster with
> {{connect.protocol}} set to {{{}eager{}}}, or by stopping an existing worker
> in that cluster, reconfiguring it with the new protocol, and restarting it.
> When the worker (re)joins the cluster, a rebalance takes place using the
> eager protocol, and duplicate task instances are created on the cluster.
> This occurs because:
> * The leader does not send out an assignment that revokes all connectors and
> tasks for the cluster during that round
> * Workers do not respond to the downgrade in protocol by revoking all
> connectors and tasks that they were running before the rebalance that are not
> included in the new assignment they received during the rebalance
> It's likely that this bug hasn't surfaced sooner because any subsequent
> rebalance should cause all connectors and tasks on all each in the cluster to
> be proactively revoked before the worker rejoins the group.
> [KIP-415|https://cwiki.apache.org/confluence/display/KAFKA/KIP-415%3A+Incremental+Cooperative+Rebalancing+in+Kafka+Connect#KIP415:IncrementalCooperativeRebalancinginKafkaConnect-Compatibility,Deprecation,andMigrationPlan]
> provides one way to address this:
> {quote}To downgrade your cluster to use protocol version 0 from version 1 or
> higher with {{eager}} rebalancing policy what is required is to switch one of
> the workers back to {{eager}} mode.
> {panel}
> {panel}
> |{{connect.protocol = eager}}|
> Once this worker joins, the group will downgrade to protocol version 0 and
> {{eager}} rebalancing policy, with immediately release of resources upon
> joining the group. This process will require a one-time double rebalancing,
> with the leader detecting the downgrade and first sending a downgraded
> assignment with empty assigned connectors and tasks and from then on just
> regular downgraded assignments.
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)