Benjamin Mahler created MESOS-396:
-------------------------------------

             Summary: Slave GarbageCollector needs to delete the parent 
executor directories. It currently only deletes the executor run directories.
                 Key: MESOS-396
                 URL: https://issues.apache.org/jira/browse/MESOS-396
             Project: Mesos
          Issue Type: Bug
            Reporter: Benjamin Mahler
            Priority: Blocker


The result of this is that long lived slaves accumulate a large number of empty 
executor directories. All that remains in these directories is a broken link to 
the 'latest' run.

Over time, as the slave approaches having LINK_MAX empty executor directories, 
the slave will crash from mkdir failing, as was found in MESOS-391.

The fix is that we have to schedule the executor parent directories for 
deletion, however the GC module does not know whether the parent executor can 
be deleted! This is because there could be more tasks launched with the same 
executor id, since having scheduled the directory for deletion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to