[ 
https://issues.apache.org/jira/browse/KAFKA-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

fujian updated KAFKA-19371:
---------------------------
    Description: 
*[Precondition]*
Kafka cluster already enabled the remote storage feature based on inner topic's 
implementation. The core inner topic "__remote_log_metadata" already created.
 
*[Steps]*

1. Restart one broker of the Kafka cluster.

2. Check the log and the code logic for the "__remote_log_metadata"s creating 
when broker restarting
 
*[Expect result]*
The broker shouldn't attempt to call API to create the topic due to that it 
already existed.
 
*[Actual result]*
The results are different which depend on the start process' duration for 
broker:
*Case 1: Happy Path when restarting take a short time*
[2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata 
{color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 
50,

*Case 2: Unhappy path 1 when restarting take some time*
[2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} does 
not exist{color}. Error: Timed out waiting for a node assignment. Call: 
listNodes
[2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] 
{color:#de350b}already exists{color}
*Case 3: Unhappy path 2 when restarting take a long time.*
[2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata {color:#de350b}does 
not exist{color}. Error: {color:#de350b}Timed out waiting{color} for a node 
assignment. Call: {color:#de350b}listNodes {color}at
[2025-06-03 21:58:21,153] ERROR Encountered error while creating 
__remote_log_metadata topic. java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed 
out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at
 
>From the log and current code. we can know that {color:#de350b}case 2 and case 
>3 both give the prompt "the topic does not exist" and try to call topic 
>creating API. In actually. it is useless and contradict the fact that the 
>topic already existed. Especially. the case 2's log prompt the topic existed 
>and not existed at the same time.{color}
 
*[Root Cause analyst]*
After reviewing the related code 
(TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one 
{color:#de350b}wrong implement{color} to judge one topic existed or not.
So let me create this [PR |#19899 · apache/kafka]to fix this minor bug. Thanks
 
FYI:
Why we got the timeout exception?
It is normal case based on the fact:
When restarting broker. The connection to query/create topic in 
"TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the 
broker's self  get ready.
[2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] Connection 
to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be established. 
Node may not be available.
[2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING to 
STARTED (kafka.server.BrokerServer)

  was:
*[Precondition]*
Kafka cluster already enabled the remote storage feature based on inner topic's 
implementation. The core inner topic "__remote_log_metadata" already created.
 
*[Steps]*

1. Restart one broker of the Kafka cluster.

2. Check the log and the code logic for the "__remote_log_metadata"s creating 
when broker restarting
 
*[Expect result]*
The broker shouldn't attempt to call API to create the topic due to that it 
already existed.
 
*[Actual result]*
The results are different which depend on the start process' duration for 
broker:
*Case 1: Happy Path when restarting take a short time*
[2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata 
{color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 
50,

*Case 2: Unhappy path 1 when restarting take some time*
[2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} does 
not exist{color}. Error: Timed out waiting for a node assignment. Call: 
listNodes
[2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] 
{color:#de350b}already exists{color}
*Case 3: Unhappy path 2 when restarting take a long time.*
[2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata {color:#de350b}does 
not exist{color}. Error: {color:#de350b}Timed out waiting{color} for a node 
assignment. Call: {color:#de350b}listNodes {color}at
[2025-06-03 21:58:21,153] ERROR Encountered error while creating 
__remote_log_metadata topic. java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed 
out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at
 
>From the log and current code. we can know that {color:#de350b}case 2 and case 
>3 both give the prompt "the topic does not exist" and try to call topic 
>creating API. In actually. it is useless and contradict the fact that the 
>topic already existed. Especially. the case 2's log prompt the topic existed 
>and not existed at the same time.{color}
 
*[Root Cause analyst]*
After reviewing the related code 
(TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one 
{color:#de350b}wrong implement{color} to judge one topic existed or not.
So let me create one [PR |[When a broker restarts, it should not attempt to 
create the __remote_log_metadata topic if it already exists. by jiafu1115 · 
Pull Request #19899 · 
apache/kafka|https://github.com/apache/kafka/pull/19899]]to fix this minor bug. 
Thanks
 
FYI:
Why we got the timeout exception?
It is normal case based on the fact:
When restarting broker. The connection to query/create topic in 
"TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the 
broker's self  get ready.
[2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] Connection 
to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be established. 
Node may not be available.
[2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING to 
STARTED (kafka.server.BrokerServer)


> When a broker restarts, it should not attempt to create the 
> __remote_log_metadata topic if it already exists.
> -------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-19371
>                 URL: https://issues.apache.org/jira/browse/KAFKA-19371
>             Project: Kafka
>          Issue Type: Bug
>          Components: Tiered-Storage
>    Affects Versions: 3.9.0, 4.0.0
>            Reporter: fujian
>            Priority: Major
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> *[Precondition]*
> Kafka cluster already enabled the remote storage feature based on inner 
> topic's implementation. The core inner topic "__remote_log_metadata" already 
> created.
>  
> *[Steps]*
> 1. Restart one broker of the Kafka cluster.
> 2. Check the log and the code logic for the "__remote_log_metadata"s creating 
> when broker restarting
>  
> *[Expect result]*
> The broker shouldn't attempt to call API to create the topic due to that it 
> already existed.
>  
> *[Actual result]*
> The results are different which depend on the start process' duration for 
> broker:
> *Case 1: Happy Path when restarting take a short time*
> [2025-06-03 22:35:11,648] INFO Topic __remote_log_metadata 
> {color:#00875a}exists{color}. TopicId: 4CT2TTC-R6u7fNo_njYlDA, numPartitions: 
> 50,
> *Case 2: Unhappy path 1 when restarting take some time*
> [2025-06-03 23:59:40,505] INFO Topic __remote_log_metadata{color:#de350b} 
> does not exist{color}. Error: Timed out waiting for a node assignment. Call: 
> listNodes
> [2025-06-04 00:00:36,938] INFO Topic [__remote_log_metadata] 
> {color:#de350b}already exists{color}
> *Case 3: Unhappy path 2 when restarting take a long time.*
> [2025-06-03 21:57:21,151] INFO Topic __remote_log_metadata 
> {color:#de350b}does not exist{color}. Error: {color:#de350b}Timed out 
> waiting{color} for a node assignment. Call: {color:#de350b}listNodes {color}at
> [2025-06-03 21:58:21,153] ERROR Encountered error while creating 
> __remote_log_metadata topic. java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.{color:#de350b}TimeoutException{color}: Timed 
> out waiting for a node assignment. Call: {color:#de350b}createTopics {color}at
>  
> From the log and current code. we can know that {color:#de350b}case 2 and 
> case 3 both give the prompt "the topic does not exist" and try to call topic 
> creating API. In actually. it is useless and contradict the fact that the 
> topic already existed. Especially. the case 2's log prompt the topic existed 
> and not existed at the same time.{color}
>  
> *[Root Cause analyst]*
> After reviewing the related code 
> (TopicBasedRemoteLogMetadataManager#doesTopicExist). It is one 
> {color:#de350b}wrong implement{color} to judge one topic existed or not.
> So let me create this [PR |#19899 · apache/kafka]to fix this minor bug. Thanks
>  
> FYI:
> Why we got the timeout exception?
> It is normal case based on the fact:
> When restarting broker. The connection to query/create topic in 
> "TopicBasedRemoteLogMetadataManager#initializeResources"will fail until the 
> broker's self  get ready.
> [2025-06-03 23:21:20,752] WARN [AdminClient clientId=adminclient-1] 
> Connection to node -1 ([10.20.1.125:9559)|https://10-20-1-125/] could not be 
> established. Node may not be available.
> [2025-06-03 23:21:21,282] INFO [BrokerServer id=2] Transition from STARTING 
> to STARTED (kafka.server.BrokerServer)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to