slfan1989 commented on PR #4988:
URL: https://github.com/apache/ozone/pull/4988#issuecomment-2371660310

   @xichen01 @adoroszlai 
   
   During our use of deletion, I noticed that it can be very slow, especially 
after we switched to the EC policy.
   
   Our Ozone01 cluster currently has about 1K machines. Initially, we chose to 
use a `Ratis-3Replica` strategy, but for cost considerations, we gradually 
switched to the `EC-6-3` strategy in July.
   
   The following chart shows the deletion speed for `Ratis-3Replica` .
   
   
![image](https://github.com/user-attachments/assets/6da0b823-254e-4ea3-b91d-5675640dea0d)
   
   The following chart shows the deletion speed for `EC-6-3`.
   
   
![image](https://github.com/user-attachments/assets/0d0c688b-5a09-4aa9-9b02-2b9f0880f795)
   
   By reviewing the code and analyzing the logs, we found that the following 
situation can cause deletion to be very slow. We will illustrate this with an 
example.
   
   > Background
   
   We want to delete data from an EC container with ContainerId = 1000. Since 
it is EC-6-3, there are 9 replicas (DN1, DN2, DN3, ... DN9).
   
   > Process
   
   Before deletion, we first select a batch of DNs; at this time, we may only 
select DN1 to DN6. We then send the deletion command to these 6 DNs, and the 
command executes normally, successfully deleting 6 blocks. However, if DN7 to 
DN9 are not selected, our deletion process will get stuck.
   
   > Code
   
   
https://github.com/apache/ozone/blob/1f86ce80bd775fc4403617f17aa272a0d1297c7f/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/block/DeletedBlockLogImpl.java#L294-L313
   
   I came up with a possible solution to eliminate this stuck situation. We 
require that all replicas of the container to be deleted must be present in the 
selected DN list simultaneously. Otherwise, we will skip that container.
   
   ```
   private void getTransaction(DeletedBlocksTransaction tx,
         DatanodeDeletedBlockTransactions transactions,
         Set<DatanodeDetails> dnList, Set<ContainerReplica> replicas,
         Map<UUID, Map<Long, CmdStatus>> commandStatus) {
       DeletedBlocksTransaction updatedTxn =
           DeletedBlocksTransaction.newBuilder(tx)
               .setCount(transactionStatusManager.getOrDefaultRetryCount(
                 tx.getTxID(), 0))
               .build();
       
        // Requiring that replicas must be present in the DN list 
simultaneously ensures that the deletion commands for all 
        // replicas of the same container can be issued at once, avoiding 
situations where some replicas of the container are 
        // deleted while others are not.
       for (ContainerReplica replica : replicas) {
         DatanodeDetails datanodeDetails = replica.getDatanodeDetails();
         if (!dnList.contains(datanodeDetails)) {
           return;
         }
       }
   
       for (ContainerReplica replica : replicas) {
         DatanodeDetails details = replica.getDatanodeDetails();
         if (!dnList.contains(details)) {
           continue;
         }
         if (!transactionStatusManager.isDuplication(
             details, updatedTxn.getTxID(), commandStatus)) {
           transactions.addTransactionToDN(details.getUuid(), updatedTxn);
         }
       }
     }
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to