[jira] [Updated] (HDDS-12114) Prevent delete commands running after a long lock wait and send ICR earlier

Stephen O'Donnell (Jira) Fri, 31 Jan 2025 04:11:49 -0800


     [ 
https://issues.apache.org/jira/browse/HDDS-12114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Stephen O'Donnell updated HDDS-12114:
-------------------------------------
    Target Version/s:   (was: 1.4.2)

> Prevent delete commands running after a long lock wait and send ICR earlier
> ---------------------------------------------------------------------------
>
>                 Key: HDDS-12114
>                 URL: https://issues.apache.org/jira/browse/HDDS-12114
>             Project: Apache Ozone
>          Issue Type: Bug
>          Components: Ozone Datanode
>            Reporter: Stephen O'Donnell
>            Assignee: Stephen O'Donnell
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 2.0.0, 1.4.2
>
>
> We have seen some instances where delete container commands are picked from 
> the command queue within the SCM defined deadline. However they run for a 
> very long time in the handler. This cases SCM to think the delete has been 
> dropped or failed, when it is actually still running.
> The causes of the slow running command could be:
> 1. Something else has a lock on the container for a long time, blocking the 
> delete operation
> 2. Slow disk causing the removal of the container files to take a very long 
> time.
> To compound this problem, an ICR confirming the delete is not sent until the 
> very last stage of the delete process.
> To combat this, two changes are included in this Jira:
> 1. Introduce a lock timeout of 60 seconds. If it takes longer than this for 
> the lock and pre-checks to complete, the container delete is skipped.
> 2. Move the ICR to immediately after the point where the container is removed 
> from the container set. At this stage, there is no way to recover the 
> container without a DN restart and it makes sense to inform SCM that the 
> container is logically removed ASAP.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (HDDS-12114) Prevent delete commands running after a long lock wait and send ICR earlier

Reply via email to