[
https://issues.apache.org/jira/browse/KAFKA-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022251#comment-17022251
]
Randall Hauch commented on KAFKA-9468:
--------------------------------------
Thanks, [~EeveeB]. I think it makes sense that the Connect worker should fail
upon startup if the # of partitions on the config topic is not 1, as this can
lead to serious problems.
[~ewencp], do you think this requires a KIP? Technically it is changing
behavior, but distributed Connect is not really functional if the # of config
topic partitions is not 1 and the fact we're not already checking this is
probably a bug that can be fixed without a KIP. WDYT?
> config.storage.topic partition count issue is hard to debug
> -----------------------------------------------------------
>
> Key: KAFKA-9468
> URL: https://issues.apache.org/jira/browse/KAFKA-9468
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
> Reporter: Evelyn Bayes
> Priority: Minor
>
> When you run connect distributed with 2 or more workers and
> config.storage.topic has more then 1 partition, you can end up with one of
> the workers rebalancing endlessly:
> [2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1,
> groupId=connect-cluster] Current config state offset 37 is behind group
> assignment 63, reading to end of config log
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1,
> groupId=connect-cluster] Finished reading to end of log and updated config
> snapshot, new config log offset: 37
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
> [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1,
> groupId=connect-cluster] Current config state offset 37 does not match group
> assignment 63. Forcing rebalance.
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>
> In case any person viewing this doesn't know you are only ever meant to
> create this topic with one partition.
>
> *Suggested Solution*
> Make the connect worker check the partition count when it starts and if
> partition count is > 1 Kafka Connect stops and logs the reason why.
> I think this is reasonable as it would stop users just starting out from
> building it incorrectly and would be easy to fix early. For those upgrading
> this would easily be caught in a PRE-PROD environment. And even if they
> upgraded directly in PROD you would only be impacted if upgraded all connect
> workers at the same time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)