[ 
https://issues.apache.org/jira/browse/HADOOP-18639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sejin Hwang updated HADOOP-18639:
---------------------------------
    Description: 
YARN NodeManager's deletion service has two types of deletion tasks: the 
FileDeletionTask for deleting log, usercache, appcache files and the 
DockerContainerDeletionTask for deleting Docker containers.
 
The FileDeletionTask is removed from the statestore when the task is completed, 
but the DockerContainerDeletionTask is not.
Therefore, the DockerContainerDeletionTask accumulates continuously in the 
statestore.
 
This causes the NodeManager's deletion service to run the accumulated 
DockerContainerDeletionTask in the statestore when the NodeManager restarts.

As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed 
unnecessarily while processing accumulated tasks, which can cause disk full 
issues in environments where a large number of containers are allocated and 
released.

I will attach a patch soon

  was:
YARN NodeManager's deletion service has two types of deletion tasks: the 
FileDeletionTask for deleting log files and the DockerContainerDeletionTask for 
deleting Docker containers.
 
The FileDeletionTask is removed from the statestore when the task is completed, 
but the DockerContainerDeletionTask is not.
Therefore, the DockerContainerDeletionTask accumulates continuously in the 
statestore.
 
This causes the NodeManager's deletion service to run the accumulated 
DockerContainerDeletionTask in the statestore when the NodeManager restarts.

As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed 
unnecessarily while processing accumulated tasks, which can cause disk full 
issues in environments where a large number of containers are allocated and 
released.

I will attach a patch soon


> DockerContainerDeletionTask is not removed from the Nodemanager's statestore 
> when the task is completed.
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18639
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18639
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Sejin Hwang
>            Priority: Major
>              Labels: pull-request-available
>
> YARN NodeManager's deletion service has two types of deletion tasks: the 
> FileDeletionTask for deleting log, usercache, appcache files and the 
> DockerContainerDeletionTask for deleting Docker containers.
>  
> The FileDeletionTask is removed from the statestore when the task is 
> completed, but the DockerContainerDeletionTask is not.
> Therefore, the DockerContainerDeletionTask accumulates continuously in the 
> statestore.
>  
> This causes the NodeManager's deletion service to run the accumulated 
> DockerContainerDeletionTask in the statestore when the NodeManager restarts.
> As a result, the FileDeletionTask and DockerContainerDeletionTask are delayed 
> unnecessarily while processing accumulated tasks, which can cause disk full 
> issues in environments where a large number of containers are allocated and 
> released.
> I will attach a patch soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to