sodonnel opened a new pull request, #7726: URL: https://github.com/apache/ozone/pull/7726
## What changes were proposed in this pull request? We have seen some instances where delete container commands are picked from the command queue within the SCM defined deadline. However they run for a very long time in the handler. This cases SCM to think the delete has been dropped or failed, when it is actually still running. The causes of the slow running command could be: 1. Something else has a lock on the container for a long time, blocking the delete operation 2. Slow disk causing the removal of the container files to take a very long time. To compound this problem, an ICR confirming the delete is not sent until the very last stage of the delete process. To combat this, two changes are included in this PR: 1. Introduce a lock timeout of 60 seconds. If it takes longer than this for the lock and pre-checks to complete, the container delete is skipped. 2. Move the ICR to immediately after the point where the container is removed from the container set. At this stage, there is no way to recover the container without a DN restart and it makes sense to inform SCM that the container is logically removed ASAP. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-12114 ## How was this patch tested? New unit test added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
