[ 
https://issues.apache.org/jira/browse/KAFKA-16563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16563.
-------------------------------
    Fix Version/s: 3.8.0
                   3.7.1
       Resolution: Fixed

> migration to KRaft hanging after MigrationClientException
> ---------------------------------------------------------
>
>                 Key: KAFKA-16563
>                 URL: https://issues.apache.org/jira/browse/KAFKA-16563
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.7.0
>            Reporter: Luke Chen
>            Assignee: Luke Chen
>            Priority: Major
>             Fix For: 3.8.0, 3.7.1
>
>
> When running ZK migrating to KRaft process, we encountered an issue that the 
> migrating is hanging and the `ZkMigrationState` cannot move to `MIGRATION` 
> state. After investigation, the root cause is because the pollEvent didn't 
> retry with the retriable `MigrationClientException` (i.e. ZK client retriable 
> errors) while it should. And because of this, the poll event will not poll 
> anymore, which causes the KRaftMigrationDriver cannot work as expected.
>  
> {code:java}
> 2024-04-11 21:27:55,393 INFO [KRaftMigrationDriver id=5] Encountered 
> ZooKeeper error during event PollEvent. Will retry. 
> (org.apache.kafka.metadata.migration.KRaftMigrationDriver) 
> [controller-5-migration-driver-event-handler]org.apache.zookeeper.KeeperException$NodeExistsException:
>  KeeperErrorCode = NodeExists for /migration    at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:126)    at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:54)    at 
> kafka.zookeeper.AsyncResponse.maybeThrow(ZooKeeperClient.scala:570)    at 
> kafka.zk.KafkaZkClient.createInitialMigrationState(KafkaZkClient.scala:1701)  
>   at 
> kafka.zk.KafkaZkClient.getOrCreateMigrationState(KafkaZkClient.scala:1689)    
> at 
> kafka.zk.ZkMigrationClient.$anonfun$getOrCreateMigrationRecoveryState$1(ZkMigrationClient.scala:109)
>     at 
> kafka.zk.ZkMigrationClient.getOrCreateMigrationRecoveryState(ZkMigrationClient.scala:69)
>     at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.applyMigrationOperation(KRaftMigrationDriver.java:248)
>     at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.recoverMigrationStateFromZK(KRaftMigrationDriver.java:169)
>     at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver.access$1900(KRaftMigrationDriver.java:62)
>     at 
> org.apache.kafka.metadata.migration.KRaftMigrationDriver$PollEvent.run(KRaftMigrationDriver.java:794)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>     at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>     at java.base/java.lang.Thread.run(Thread.java:840){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to