ivandika3 commented on PR #9284: URL: https://github.com/apache/ozone/pull/9284#issuecomment-3530880434
> Could you be more specific on this? Ok, after revisiting the block deletion flow, I think this patch makes sense. Below are my reasoning, I might be wrong since block deletion flow is quite complex. Taking from https://ozone.apache.org/assets/ApacheOzoneBestPracticesAtDidi.pdf (Efficient Deletion), it's quoted that > Upon analyzing the code, we identified a critical detail. When SCM issues deletion requests for block replicas, it does not dispatch all requests to the DataNodes (DNs) at once. For example, in the case of EC data using the EC-6-3-1024K format, deletion requires removing 9 replicas. However, SCM often only issues deletion requests for 5 or 6 of them. As a result, even if the DNs successfully delete those replicas, the remaining 3 are not marked as deleted. These replicas are not cleared in time and remain until the system times out and re-initiates the deletion process. HDDS-11498 argues that deletion needs to be sent to all replicas or not at all so that `SCMDeletedBlockTransactionStatusManager#commitTransactions` will be able to remove the delete transaction from DB. However, this means that for container with any replicas DNs that are either 1. Not `IN_SERVICE` and `HEALTHY`, 2. SCM or DN Command queue not full (`SCMBlockDeletingService#getDatanodesWithinCommandLimit`, see HDDS-8888) The deletion will not be sent and therefore stays in the delete transactions DB. This causes the issue that we faced (where container with replicas in decommissioned datanodes will be skipped). The things to note is that for condition 1 (unhealthy DNs), it should already be handled by `ReplicationManager#getContainerReplicationHealth` even before HDDS-11498. Therefore I think HDDS-11498 is primarily aimed to handle condition 2 (SCM or DN command queue is full) . In this patch, we will exclude these non healthy DNs when calculating the inadequate replicas. So if we have container with 3 HEALTHY replicas and 1 DECOMMISSIONED replicas [DN1 (HEALTHY), DN2 (HEALTHY), DN3 (HEALTHY), DN4 (DECOMMISSIONED)], we will filter the DN list to [DN1, DN2, and DN3] since `includedDnSet` does not contain the decommissioned DN and since the `ReplicationManager#getContainerReplicationHealth` result should be healthy and the container is considered to have enough replicas, we will still send the deletion commands to all the DNs (including the DECOMMISSIONED DN) and deletion is not blocked. However, this should still work for HDDS-11498 since if the some DNs hosting HEALTHY replicas command queue are full, they will be excluded and `getContainerReplicationHealth` will now return `UNDER_REPLICATED` and will be deletion will be skipped. That said, I think it's still good to write some tests for both HDDS-11498 (@slfan1989 can raise a ticket for writing this test) and this (HDDS-13914). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
