[ 
https://issues.apache.org/jira/browse/KAFKA-18669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anil Dasari updated KAFKA-18669:
--------------------------------
    Description: 
I am setting up multiple Kafka Connect (KC) clusters on Amazon ECS, each with 2 
nodes and one connector per cluster.

When all KC clusters and connectors are created simultaneously, one of the 
connectors unexpectedly restarts. The issue is reproducible with min of 4 
clusters. One of the KC clusters connector is only restarted event the cluster 
count is increased to 10. 

I am unable to determine the root cause of this restart. It seems that the 
incremental balancing process is handling the request as if a configuration 
update is present, even though no actual config changes have occurred during 
the connector's startup phase.

Environment details
 # ECS KC cluster with node count 2
 # One connector per KC cluster
 # KC container Kafka version is 7.3.1-ccs (i.e Kafka 3.3.1)
 # Kafka version : 3.7.x

 

Logs: (timestamp, log message, level, thread, instance id)

 
{code:java}
"Jan 29, 2025 @ 09:57:35.936","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Starting connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:35.938","Resolving value for path 
dbconfig/cdc/9e08297d427e4e36be3909a393f1a4b4 & property aliases: 
database_user,database_name,database_port,database_password,database_hostname",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182""Jan
 29, 2025 @ 

"Jan 29, 2025 @ 09:57:38.414","Creating connector 
c3ff31cf87ff4de0b63f3669546bf283 of type 
com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
 

"Jan 29, 2025 @ 09:57:38.426","Instantiated connector 
c3ff31cf87ff4de0b63f3669546bf283 with version 2.4.2.Final-cdc of type class 
com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","Finished creating connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Finished starting 
connectors and tasks",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling config updates 
with incremental cooperative 
rebalancing",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Requesting rebalance due 
to reconfiguration of tasks (needsReconfigRebalance: 
true)",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Request joining group 
due to: connect worker requested 
rejoin",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","Running with cdc-debezium 
2.4.2.Final...",INFO,"connector-thread-c3ff31cf87ff4de0b63f3669546bf283","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Processing connector 
config updates; currently-owned connectors are 
[c3ff31cf87ff4de0b63f3669546bf283], and to-be-updated connectors are 
[c3ff31cf87ff4de0b63f3669546bf283]",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.433","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling connector-only 
config update by restarting connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.433","Stopping connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"{code}
Attached both successful and failed KC connector logs.

 

  was:
I am setting up multiple Kafka Connect (KC) clusters on Amazon ECS, each with 2 
nodes and one connector per cluster.

When all KC clusters and connectors are created simultaneously, one of the 
connectors unexpectedly restarts. The issue is reproducible with min of 4 
clusters. One of the KC clusters connector is only restarted event the cluster 
count is increased to 10. 

I am unable to determine the root cause of this restart. It seems that the 
incremental balancing process is handling the request as if a configuration 
update is present, even though no actual config changes have occurred during 
the connector's startup phase.

Environment details
 # ECS KC cluster with node count 2
 # One connector per KC cluster
 # KC container Kafka version is 7.3.1-ccs (i.e Kafka 3.3.1)
 # Kafka version : 3.7.x

 

Logs: (timestamp, log message, level, thread)

 
{code:java}
"Jan 29, 2025 @ 09:57:35.936","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Starting connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:35.938","Resolving value for path 
dbconfig/cdc/9e08297d427e4e36be3909a393f1a4b4 & property aliases: 
database_user,database_name,database_port,database_password,database_hostname",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182""Jan
 29, 2025 @ 

"Jan 29, 2025 @ 09:57:38.414","Creating connector 
c3ff31cf87ff4de0b63f3669546bf283 of type 
com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
 

"Jan 29, 2025 @ 09:57:38.426","Instantiated connector 
c3ff31cf87ff4de0b63f3669546bf283 with version 2.4.2.Final-cdc of type class 
com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","Finished creating connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Finished starting 
connectors and tasks",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling config updates 
with incremental cooperative 
rebalancing",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Requesting rebalance due 
to reconfiguration of tasks (needsReconfigRebalance: 
true)",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Request joining group 
due to: connect worker requested 
rejoin",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","Running with cdc-debezium 
2.4.2.Final...",INFO,"connector-thread-c3ff31cf87ff4de0b63f3669546bf283","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Processing connector 
config updates; currently-owned connectors are 
[c3ff31cf87ff4de0b63f3669546bf283], and to-be-updated connectors are 
[c3ff31cf87ff4de0b63f3669546bf283]",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.433","[Worker clientId=connect-1, 
groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling connector-only 
config update by restarting connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"

"Jan 29, 2025 @ 09:57:38.433","Stopping connector 
c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"{code}
Attached both successful and failed KC connector logs.

 


> Connector is restarted before it is started
> -------------------------------------------
>
>                 Key: KAFKA-18669
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18669
>             Project: Kafka
>          Issue Type: Bug
>          Components: connect
>    Affects Versions: 3.3.1
>            Reporter: Anil Dasari
>            Priority: Major
>         Attachments: trace-failed-9e08297d427e4e36be3909a393f1a4b4.csv, 
> trace-successful-2465899102e54ff4bde6bd03b5c72a66.csv
>
>
> I am setting up multiple Kafka Connect (KC) clusters on Amazon ECS, each with 
> 2 nodes and one connector per cluster.
> When all KC clusters and connectors are created simultaneously, one of the 
> connectors unexpectedly restarts. The issue is reproducible with min of 4 
> clusters. One of the KC clusters connector is only restarted event the 
> cluster count is increased to 10. 
> I am unable to determine the root cause of this restart. It seems that the 
> incremental balancing process is handling the request as if a configuration 
> update is present, even though no actual config changes have occurred during 
> the connector's startup phase.
> Environment details
>  # ECS KC cluster with node count 2
>  # One connector per KC cluster
>  # KC container Kafka version is 7.3.1-ccs (i.e Kafka 3.3.1)
>  # Kafka version : 3.7.x
>  
> Logs: (timestamp, log message, level, thread, instance id)
>  
> {code:java}
> "Jan 29, 2025 @ 09:57:35.936","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Starting connector 
> c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:35.938","Resolving value for path 
> dbconfig/cdc/9e08297d427e4e36be3909a393f1a4b4 & property aliases: 
> database_user,database_name,database_port,database_password,database_hostname",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182""Jan
>  29, 2025 @ 
> "Jan 29, 2025 @ 09:57:38.414","Creating connector 
> c3ff31cf87ff4de0b63f3669546bf283 of type 
> com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
>  
> "Jan 29, 2025 @ 09:57:38.426","Instantiated connector 
> c3ff31cf87ff4de0b63f3669546bf283 with version 2.4.2.Final-cdc of type class 
> com.abc.cdc.connect.postgres.CdcPostgresConnector",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","Finished creating connector 
> c3ff31cf87ff4de0b63f3669546bf283",INFO,"StartAndStopExecutor-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Finished starting 
> connectors and 
> tasks",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling config 
> updates with incremental cooperative 
> rebalancing",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Requesting rebalance 
> due to reconfiguration of tasks (needsReconfigRebalance: 
> true)",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Request joining group 
> due to: connect worker requested 
> rejoin",DEBUG,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","Running with cdc-debezium 
> 2.4.2.Final...",INFO,"connector-thread-c3ff31cf87ff4de0b63f3669546bf283","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.432","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Processing connector 
> config updates; currently-owned connectors are 
> [c3ff31cf87ff4de0b63f3669546bf283], and to-be-updated connectors are 
> [c3ff31cf87ff4de0b63f3669546bf283]",TRACE,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.433","[Worker clientId=connect-1, 
> groupId=pg-group-cdc-9e08297d427e4e36be3909a393f1a4b4] Handling 
> connector-only config update by restarting connector 
> c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"
> "Jan 29, 2025 @ 09:57:38.433","Stopping connector 
> c3ff31cf87ff4de0b63f3669546bf283",INFO,"DistributedHerder-connect-1-1","i-0c2dd435cdbcec182"{code}
> Attached both successful and failed KC connector logs.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to