ivandika3 commented on PR #9284:
URL: https://github.com/apache/ozone/pull/9284#issuecomment-3530880434

   > Could you be more specific on this?
   
   Ok, after revisiting the block deletion flow, I think this patch makes sense.
   
   Below are my reasoning, I might be wrong since block deletion flow is quite 
complex.
   
   Taking from 
https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf (Efficient 
Deletion), it's quoted that
   
   > Upon analyzing the code, we identified a critical detail. When SCM issues 
deletion requests for
   block replicas, it does not dispatch all requests to the DataNodes (DNs) at 
once. For example, in
   the case of EC data using the EC-6-3-1024K format, deletion requires 
removing 9 replicas.
   However, SCM often only issues deletion requests for 5 or 6 of them. As a 
result, even if the DNs
   successfully delete those replicas, the remaining 3 are not marked as 
deleted. These replicas are
   not cleared in time and remain until the system times out and re-initiates 
the deletion process.
   
   HDDS-11498 argues that deletion needs to be sent to all replicas or not at 
all so that `SCMDeletedBlockTransactionStatusManager#commitTransactions` will 
be able to remove the delete transaction from DB. However, this means that for 
container with any replicas DNs that are either
   1. Not `IN_SERVICE` and `HEALTHY`,
   2. SCM or DN Command queue not full 
(`SCMBlockDeletingService#getDatanodesWithinCommandLimit`, see HDDS-8888)
   The deletion will not be sent and therefore stays in the delete transactions 
DB. This causes the issue that we faced (where container with replicas in 
decommissioned datanodes will be skipped). The things to note is that for 
condition 1 (unhealthy DNs), it should already be handled by 
`ReplicationManager#getContainerReplicationHealth` even before HDDS-11498. 
Therefore I think HDDS-11498 is primarily aimed to handle condition 2 (SCM or 
DN command queue is full) .
   
   In this patch, we will exclude these non healthy DNs when calculating the 
inadequate replicas. So if we have container with 3 HEALTHY replicas and 1 
DECOMMISSIONED replicas [DN1 (HEALTHY), DN2 (HEALTHY), DN3 (HEALTHY), DN4 
(DECOMMISSIONED)], we will filter the DN list to [DN1, DN2, and DN3] since 
`includedDnSet` does not contain the decommissioned DN and since the  
`ReplicationManager#getContainerReplicationHealth` result should be healthy and 
the container is considered to have enough replicas, we will still send the 
deletion commands to all the DNs (including the DECOMMISSIONED DN) and deletion 
is not blocked. However, this should still work for HDDS-11498 since if the 
some DNs hosting HEALTHY replicas command queue are full, they will be excluded 
and `getContainerReplicationHealth` will now return `UNDER_REPLICATED` and will 
be deletion will be skipped.
   
   That said, I think it's still good to write some tests for both HDDS-11498 
(@slfan1989 can raise a ticket for writing this test) and this (HDDS-13914). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to