[jira] [Updated] (KAFKA-9468) config.storage.topic partition count issue is hard to debug

Evelyn Bayes (Jira) Wed, 22 Jan 2020 19:27:00 -0800


     [ 
https://issues.apache.org/jira/browse/KAFKA-9468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Evelyn Bayes updated KAFKA-9468:
--------------------------------
    Description: 
When you run connect distributed with 2 or more workers and 
config.storage.topic has more then 1 partition, you can end up with one of the 
workers rebalancing endlessly:

[2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Current config state offset 37 is behind group 
assignment 63, reading to end of config log 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
 [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Finished reading to end of log and updated config 
snapshot, new config log offset: 37 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
 [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Current config state offset 37 does not match group 
assignment 63. Forcing rebalance. 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)

 

In case any person viewing this doesn't know you are only ever meant to create 
this topic with one partition.

 

*Suggested Solution*

Make the connect worker check the partition count when it starts and if 
partition count is > 1 Kafka Connect stops and logs the reason why.

I think this is reasonable as it would stop users just starting out from 
building it incorrectly and would be easy to fix early. For those upgrading 
this would easily be caught in a PRE-PROD environment. And even if they 
upgraded directly in PROD you would only be impacted if upgraded all connect 
workers at the same time.

  was:
When you run connect distributed with 2 or more workers and 
config.storage.topic has more then 1 partition, you can end up with one of the 
workers rebalancing endlessly:

[2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Current config state offset 37 is behind group 
assignment 63, reading to end of config log 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Finished reading to end of log and updated config 
snapshot, new config log offset: 37 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)
[2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
groupId=connect-cluster] Current config state offset 37 does not match group 
assignment 63. Forcing rebalance. 
(org.apache.kafka.connect.runtime.distributed.DistributedHerder)

 

*Suggested Solution*

Make the connect worker check the partition count when it starts and if 
partition count is > 1 Kafka Connect stops and logs the reason why.

I think this is reasonable as it would stop users just starting out from 
building it incorrectly and would be easy to fix early. For those upgrading 
this would easily be caught in a PRE-PROD environment. And even if they 
upgraded directly in PROD you would only be impacted if upgraded all connect 
workers at the same time.


> config.storage.topic partition count issue is hard to debug
> -----------------------------------------------------------
>
>                 Key: KAFKA-9468
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9468
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 1.0.2, 1.1.1, 2.0.1, 2.1.1, 2.2.2, 2.4.0, 2.3.1
>            Reporter: Evelyn Bayes
>            Priority: Minor
>
> When you run connect distributed with 2 or more workers and 
> config.storage.topic has more then 1 partition, you can end up with one of 
> the workers rebalancing endlessly:
> [2020-01-13 12:53:23,535] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 is behind group 
> assignment 63, reading to end of config log 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Finished reading to end of log and updated config 
> snapshot, new config log offset: 37 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  [2020-01-13 12:53:23,584] INFO [Worker clientId=connect-1, 
> groupId=connect-cluster] Current config state offset 37 does not match group 
> assignment 63. Forcing rebalance. 
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
>  
> In case any person viewing this doesn't know you are only ever meant to 
> create this topic with one partition.
>  
> *Suggested Solution*
> Make the connect worker check the partition count when it starts and if 
> partition count is > 1 Kafka Connect stops and logs the reason why.
> I think this is reasonable as it would stop users just starting out from 
> building it incorrectly and would be easy to fix early. For those upgrading 
> this would easily be caught in a PRE-PROD environment. And even if they 
> upgraded directly in PROD you would only be impacted if upgraded all connect 
> workers at the same time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-9468) config.storage.topic partition count issue is hard to debug

Reply via email to