[ 
https://issues.apache.org/jira/browse/MESOS-396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607057#comment-13607057
 ] 

Benjamin Mahler commented on MESOS-396:
---------------------------------------

These are the two major components:
https://reviews.apache.org/r/10028/
https://reviews.apache.org/r/10032/
                
> Slave GarbageCollector needs to delete the parent executor directories. It 
> currently only deletes the executor run directories.
> -------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-396
>                 URL: https://issues.apache.org/jira/browse/MESOS-396
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Benjamin Mahler
>            Priority: Blocker
>
> The result of this is that long lived slaves accumulate a large number of 
> empty executor directories. All that remains in these directories is a broken 
> link to the 'latest' run.
> Over time, as the slave approaches having LINK_MAX empty executor 
> directories, the slave will crash from mkdir failing, as was found in 
> MESOS-391.
> The fix is that we have to schedule the executor parent directories for 
> deletion, however the GC module does not know whether the parent executor can 
> be deleted! This is because there could be more tasks launched with the same 
> executor id, since having scheduled the directory for deletion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to