[jira] [Updated] (KAFKA-17190) AssignmentsManager gets stuck retrying on deleted topics

Colin McCabe (Jira) Tue, 23 Jul 2024 20:56:00 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-17190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Colin McCabe updated KAFKA-17190:
---------------------------------
    Description: 
In MetadataVersion 3.7-IV2 and above, on the broker, AssignmentsManager sends 
an RPC to the controller informing it about which directory we have chosen to 
place a replica on. Unfortunately, the code does not check to see if the topic 
still exists in the MetadataImage before sending the RPC. It will also retry 
infinitely. Therefore, when a topic is created and deleted in rapid succession, 
we can get stuck retrying the AssignReplicasToDirsRequest forever.

In order to prevent this problem, the AssignmentsManager should check if a 
topic still exists (and is still present on the broker in question) before 
sending the RPC. In order to prevent log spam, we should not log any error 
messages until several minutes have gone past without success. Finally, rather 
than creating a new EventQueue event for each assignment request, we should 
simply modify a shared data structure and schedule a deferred event to send the 
accumulated RPCs. This will improve efficiency.

  was:In MetadataVersion 3.7-IV2


> AssignmentsManager gets stuck retrying on deleted topics
> --------------------------------------------------------
>
>                 Key: KAFKA-17190
>                 URL: https://issues.apache.org/jira/browse/KAFKA-17190
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Colin McCabe
>            Assignee: Colin McCabe
>            Priority: Critical
>
> In MetadataVersion 3.7-IV2 and above, on the broker, AssignmentsManager sends 
> an RPC to the controller informing it about which directory we have chosen to 
> place a replica on. Unfortunately, the code does not check to see if the 
> topic still exists in the MetadataImage before sending the RPC. It will also 
> retry infinitely. Therefore, when a topic is created and deleted in rapid 
> succession, we can get stuck retrying the AssignReplicasToDirsRequest forever.
> In order to prevent this problem, the AssignmentsManager should check if a 
> topic still exists (and is still present on the broker in question) before 
> sending the RPC. In order to prevent log spam, we should not log any error 
> messages until several minutes have gone past without success. Finally, 
> rather than creating a new EventQueue event for each assignment request, we 
> should simply modify a shared data structure and schedule a deferred event to 
> send the accumulated RPCs. This will improve efficiency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-17190) AssignmentsManager gets stuck retrying on deleted topics

Reply via email to