[ 
https://issues.apache.org/jira/browse/KAFKA-18930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932938#comment-17932938
 ] 

Luke Chen commented on KAFKA-18930:
-----------------------------------

[~davidarthur] [~mumrah] , I'd like to hear your thought on this issue. Thanks.

> KRaft MigrationEvent won't retry when failing to write data to ZK 
> ------------------------------------------------------------------
>
>                 Key: KAFKA-18930
>                 URL: https://issues.apache.org/jira/browse/KAFKA-18930
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 3.9.0
>            Reporter: Luke Chen
>            Priority: Major
>
> When running ZK migrating to KRaft, there will be a dual-write mode. In that 
> mode, metadata will write to KRaft, then write to ZK asynchronously. When 
> there's some exception, KRaft MigrationEvent won't retry when failing to 
> write data to ZK. That causes metadata inconsistency between KRaft and ZK.
>  
> Note:
> 1. Besides, when doing KRaft controller clean shutdown, we should keep 
> retrying the failing ZK writing until force shutdown, to make sure the 
> metadata is consistent.
> 2.  When doing shutdown, [the order of 
> shutdown|https://github.com/apache/kafka/blob/1ec1043d5197c4f807fa5cbc41d875b289443096/core/src/main/scala/kafka/server/ControllerServer.scala#L69-L76]
>  is to close ZK -> close RPC Client -> close migration driver. That causes 
> another issue that even if we retry the ZK write, it will never succeed when 
> shutdown is ongoing because ZK connection is closed first.
>  
> The impact is when rolling back to ZK mode during migration, the metadata in 
> ZK is out of date



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to