LB-Yu opened a new issue, #1476:
URL: https://github.com/apache/fluss/issues/1476

   ### Search before asking
   
   - [x] I searched in the [issues](https://github.com/alibaba/fluss/issues) 
and found nothing similar.
   
   
   ### Description
   
   Currently, We found that the processing of 
`DeleteReplicaResponseReceivedEvent` can be blocked for a long time in some 
cases, especially when handling partitions of KV tables.
   <img width="1502" height="614" alt="Image" 
src="https://github.com/user-attachments/assets/a66d7c93-1e20-4995-84f3-67566dc7c7d2";
 />
   
   After some debug, I found the problem is at 
`ZooKeeperClient#deletePartitionAssignment`. This method will recursively 
delete all the children for the partition assignment ZNode which may spend a 
lot of time. In my local tests, deleting a parent node with three levels of 
child nodes (containing 1,024 first-level child nodes, each of which has 4 
second-level child nodes, and each second-level child node has 5 third-level 
child nodes) takes more than 5 seconds.
   ```java
       @Test
       void test() throws Exception {
           String basePath = "/perfTest";
           int firstLevel = 1024;
           int secondLevel = 4;
           int thirdLevel = 5;
   
           CuratorFramework client = zookeeperClient.getCuratorClient();
   
           for (int i = 0; i < firstLevel; i++) {
               String level1 = basePath + "/n1_" + i;
               client.create().creatingParentsIfNeeded().forPath(level1);
               for (int j = 0; j < secondLevel; j++) {
                   String level2 = level1 + "/n2_" + j;
                   client.create().creatingParentsIfNeeded().forPath(level2);
                   for (int k = 0; k < thirdLevel; k++) {
                       String level3 = level2 + "/n3_" + k;
                       
client.create().creatingParentsIfNeeded().forPath(level3, ("dummy").getBytes());
                   }
               }
           }
   
           long start = System.currentTimeMillis();
           client.delete()
                   .deletingChildrenIfNeeded()
                   .forPath(basePath);
           long end = System.currentTimeMillis();
           System.out.println("Delete finished. Cost: " + (end - start) + " 
ms");
       }
   ```
   
   For KV tables, there are many snapshot nodes, resulting in more child nodes 
compared to Log tables. Therefore, the deletion event takes longer, and this 
issue is more pronounced for KV tables.
   <img width="506" height="672" alt="Image" 
src="https://github.com/user-attachments/assets/e527ff67-d6c7-46e9-a11d-b6da6a375284";
 />
   
   ### Willingness to contribute
   
   - [x] I'm willing to submit a PR!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to