[ 
https://issues.apache.org/jira/browse/MESOS-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14353394#comment-14353394
 ] 

Bernd Mathiske commented on MESOS-391:
--------------------------------------

A simple approach would be to fix the problem right inside 
"paths::createExecutorDirectory()":

1. Check how many subdirs the executor parent dir already has. 
2. If it is close to the limit, find out which of the existing subdirs are the 
oldest.
3. Delete one or more of the latter.
4. Now it should be safe to proceed with the mkdir().

We could then either immediately remove the deleted paths from the GC's 
internal bookkeeping or we could just make sure that whenever their GC time is 
up it makes no difference if they are already gone ("pre-deleted").

Problems with this approach:
a) Recursively deleting a directory may take a while. This operation blocks 
slave process (actor) progress. This is already a problem with the mkdir() 
itself, but that one is less likely to take long (although ultimately it 
might). It is one file operation. In contrast, the number of file system 
operations for a recursive deletion is in general unknown and could potentially 
be large.
b) If you are close to the limit and you only remove one subdir then, you may 
end up doing so again and again for many tasks.

I propose we deal with problem a) by handling the deletion on a different 
process than the slave process. The GC process is an obvious candidate. In the 
slave process, we can wait for a future that signals the completion of the 
deletion. (There are some concurrency issues that we will get to later.)

The advantage of this approach is that it is "watertight". There is no way 
LINK_MAX can be exceeded by executor dirs any more then. 

On the other hand, maybe speeding up GC for subdirs once a parent dir fills up 
more than say 3/4 is always fast enough? But how do you know that for sure if 
the duration of a file deletion is in principle unknown?

That said, the two approaches can of course be combined. I would start with the 
watertight one and add the other one if still so desired.


> Slave GarbageCollector needs to also take into account the number of links, 
> when determining removal time.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-391
>                 URL: https://issues.apache.org/jira/browse/MESOS-391
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Bernd Mathiske
>              Labels: twitter
>
> The slave garbage collector does not take into account the number of links 
> present, which means that if we create a lot of executor directories (up to 
> LINK_MAX), we won't necessarily GC.
> As a result of this, the slave crashes:
> F0313 21:40:02.926494 33746 paths.hpp:233] CHECK_SOME(mkdir) failed: Failed 
> to create executor directory 
> '/var/lib/mesos/slaves/201303090208-1937777162-5050-38880-267/frameworks/201103282247-0000000019-0000/executors/thermos-1363210801777-mesos-meta_slave_0-27-e74e4b30-dcf1-4e88-8954-dd2b40b7dd89/runs/499fcc13-c391-421c-93d2-a56d1a4a931e':
>  Too many links
> *** Check failure stack trace: ***
>     @     0x7f9320f82f9d  google::LogMessage::Fail()
>     @     0x7f9320f88c07  google::LogMessage::SendToLog()
>     @     0x7f9320f8484c  google::LogMessage::Flush()
>     @     0x7f9320f84ab6  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9320c70312  _CheckSome::~_CheckSome()
>     @     0x7f9320c9dd5c  
> mesos::internal::slave::paths::createExecutorDirectory()
>     @     0x7f9320c9e60d  mesos::internal::slave::Framework::createExecutor()
>     @     0x7f9320c7a7f7  mesos::internal::slave::Slave::runTask()
>     @     0x7f9320c9cb43  ProtobufProcess<>::handler4<>()
>     @     0x7f9320c8678b  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9320c9d1ab  ProtobufProcess<>::visit()
>     @     0x7f9320e4c774  process::MessageEvent::visit()
>     @     0x7f9320e40a1d  process::ProcessManager::resume()
>     @     0x7f9320e41268  process::schedule()
>     @     0x7f932055973d  start_thread
>     @     0x7f931ef3df6d  clone
> The fix here is to take into account the number of links (st_nlinks), when 
> determining whether we need to GC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to