[ https://issues.apache.org/jira/browse/KAFKA-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
fujian updated KAFKA-19371: --------------------------- Description: *[Precondition]* Kafka cluster already enabled the remote storage feature based on inner topic's implementation. The core inner topic "__remote_log_metadata" already created. *[Steps]* 1. Restart one broker of the Kafka cluster. 2. Check the log and the code logic for the "__remote_log_metadata"s creating when broker restarting *[Expect result]* The broker shouldn't attempt to call API to create the topic due to that it already existed. *[Actual result]* The results are different which depend on the start process' duration for broker: *Case 1: Happy Path when restarting take a short time* [2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata {color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 50, *Case 2: Unhappy path 1 when restarting take some time* [2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} does not exist{color}. Error: Timed out waiting for a node assignment. Call: listNodes [2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] {color:#de350b}already exists{color} *Case 3: Unhappy path 2 when restarting take a long time.* [2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata {color:#de350b}does not exist{color}. Error: {color:#de350b}Timed out waiting{color} for a node assignment. Call: {color:#de350b}listNodes {color}at [2025-06-03 21:58:21,153] ERROR Encountered error while creating __remote_log_metadata topic. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at >From the log and current code. we can know that {color:#de350b}case 2 and case >3 both give the prompt "the topic does not exist" and try to call topic >creating API. In actually. it is useless and contradict the fact that the >topic already existed. Especially. the case 2's log prompt the topic existed >and not existed at the same time.{color} *[Root Cause analyst]* After reviewing the related code (TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one {color:#de350b}wrong implement{color} to judge one topic existed or not. So let me create this [PR |#19899 · apache/kafka]to fix this minor bug. Thanks FYI: Why we got the timeout exception? It is normal case based on the fact: When restarting broker. The connection to query/create topic in "TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the broker's self get ready. [2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] Connection to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be established. Node may not be available. [2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING to STARTED (kafka.server.BrokerServer) was: *[Precondition]* Kafka cluster already enabled the remote storage feature based on inner topic's implementation. The core inner topic "__remote_log_metadata" already created. *[Steps]* 1. Restart one broker of the Kafka cluster. 2. Check the log and the code logic for the "__remote_log_metadata"s creating when broker restarting *[Expect result]* The broker shouldn't attempt to call API to create the topic due to that it already existed. *[Actual result]* The results are different which depend on the start process' duration for broker: *Case 1: Happy Path when restarting take a short time* [2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata {color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 50, *Case 2: Unhappy path 1 when restarting take some time* [2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} does not exist{color}. Error: Timed out waiting for a node assignment. Call: listNodes [2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] {color:#de350b}already exists{color} *Case 3: Unhappy path 2 when restarting take a long time.* [2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata {color:#de350b}does not exist{color}. Error: {color:#de350b}Timed out waiting{color} for a node assignment. Call: {color:#de350b}listNodes {color}at [2025-06-03 21:58:21,153] ERROR Encountered error while creating __remote_log_metadata topic. java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at >From the log and current code. we can know that {color:#de350b}case 2 and case >3 both give the prompt "the topic does not exist" and try to call topic >creating API. In actually. it is useless and contradict the fact that the >topic already existed. Especially. the case 2's log prompt the topic existed >and not existed at the same time.{color} *[Root Cause analyst]* After reviewing the related code (TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one {color:#de350b}wrong implement{color} to judge one topic existed or not. So let me create one [PR |[When a broker restarts, it should not attempt to create the __remote_log_metadata topic if it already exists. by jiafu1115 · Pull Request #19899 · apache/kafka|https://github.com/apache/kafka/pull/19899]]to fix this minor bug. Thanks FYI: Why we got the timeout exception? It is normal case based on the fact: When restarting broker. The connection to query/create topic in "TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the broker's self get ready. [2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] Connection to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be established. Node may not be available. [2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING to STARTED (kafka.server.BrokerServer) > When a broker restarts, it should not attempt to create the > __remote_log_metadata topic if it already exists. > ------------------------------------------------------------------------------------------------------------- > > Key: KAFKA-19371 > URL: https://issues.apache.org/jira/browse/KAFKA-19371 > Project: Kafka > Issue Type: Bug > Components: Tiered-Storage > Affects Versions: 3.9.0, 4.0.0 > Reporter: fujian > Priority: Major > Original Estimate: 72h > Remaining Estimate: 72h > > *[Precondition]* > Kafka cluster already enabled the remote storage feature based on inner > topic's implementation. The core inner topic "__remote_log_metadata" already > created. > > *[Steps]* > 1. Restart one broker of the Kafka cluster. > 2. Check the log and the code logic for the "__remote_log_metadata"s creating > when broker restarting > > *[Expect result]* > The broker shouldn't attempt to call API to create the topic due to that it > already existed. > > *[Actual result]* > The results are different which depend on the start process' duration for > broker: > *Case 1: Happy Path when restarting take a short time* > [2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata > {color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: > 50, > *Case 2: Unhappy path 1 when restarting take some time* > [2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} > does not exist{color}. Error: Timed out waiting for a node assignment. Call: > listNodes > [2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] > {color:#de350b}already exists{color} > *Case 3: Unhappy path 2 when restarting take a long time.* > [2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata > {color:#de350b}does not exist{color}. Error: {color:#de350b}Timed out > waiting{color} for a node assignment. Call: {color:#de350b}listNodes {color}at > [2025-06-03 21:58:21,153] ERROR Encountered error while creating > __remote_log_metadata topic. java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed > out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at > > From the log and current code. we can know that {color:#de350b}case 2 and > case 3 both give the prompt "the topic does not exist" and try to call topic > creating API. In actually. it is useless and contradict the fact that the > topic already existed. Especially. the case 2's log prompt the topic existed > and not existed at the same time.{color} > > *[Root Cause analyst]* > After reviewing the related code > (TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one > {color:#de350b}wrong implement{color} to judge one topic existed or not. > So let me create this [PR |#19899 · apache/kafka]to fix this minor bug. Thanks > > FYI: > Why we got the timeout exception? > It is normal case based on the fact: > When restarting broker. The connection to query/create topic in > "TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the > broker's self get ready. > [2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] > Connection to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be > established. Node may not be available. > [2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING > to STARTED (kafka.server.BrokerServer) -- This message was sent by Atlassian Jira (v8.20.10#820010)