[ 
https://issues.apache.org/jira/browse/MESOS-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348941#comment-14348941
 ] 

Ritwik Yadav commented on MESOS-391:
------------------------------------

Failed attempt at solutions and some questions:

I have tried a couple of solutions which didn’t get accepted. I have described 
them in chronological order:

1. Instead of scheduling every directory for garbage collection after 
‘flags.gc_delay’ seconds from its last modification time, I suggested that we 
‘immediately’ remove the ones which had a high subdirectory count or otherwise 
make this delay a function of the number of links. This solution was rejected 
because I was changing the order in which the directories got garbage collected.

2. Another solution that I tried was to modify the separate thread that 
monitors disk usage to also monitor the number of executor directories in all 
frameworks and speed up the garbage collection process for all directories 
(scheduled to be garbage collected within the next ‘t’ seconds) if any one 
framework had high number of executor directories, where,
t = function (disk_usage_by_slave, 
highest_number_of_executor_directories_in_any_active_framework)

This solution was criticized because excessive number of executor directories 
in one framework affects garbage collection in other frameworks. 

Later on, I realized that this solution was incorrect as this would not 
necessarily remove the directory with the highest number of subdirectories. 

I also need to modify the slave::garbageCollect function to schedule each 
directory after ‘d’ seconds. In the present implementation ‘d’ is computed so 
that the garbage collection takes place after ‘flags.gc_delay’ seconds after 
the last modification time. I propose that we make ‘d’ a function of the number 
of subdirectories present in that directory. As per my understanding finding 
this function is the key to solving this issue. Also, the form of the function 
needs to be thought of:
a) d = function (last_modification_time, number_of_subdirectories)
or,
b) d = function (number_of_subdirectories)

With this, I would like to draw attention to some of the fundamental questions 
I had:


1. What is the major reason behind scheduling every directory after 
flags.gc_delay seconds after its last modification time?
2. Is the present order in which garbage collection is done important? Say for 
example, If I schedule two directories X and Y and X is scheduled after t1 
seconds and Y is scheduled after t2 seconds. Does the premise t1 < t2, strictly 
mandate that X be always deleted before Y?
3. This problem was encountered when creating more ‘executor’ directories so 
there is evidence that the number of executors may exceed _PC_LINK_MAX. What 
about the number of executor runs? Are we certain that the number of executor 
runs would always be less than _PC_LINK_MAX?


> Slave GarbageCollector needs to also take into account the number of links, 
> when determining removal time.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: MESOS-391
>                 URL: https://issues.apache.org/jira/browse/MESOS-391
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Benjamin Mahler
>            Assignee: Ritwik Yadav
>              Labels: twitter
>
> The slave garbage collector does not take into account the number of links 
> present, which means that if we create a lot of executor directories (up to 
> LINK_MAX), we won't necessarily GC.
> As a result of this, the slave crashes:
> F0313 21:40:02.926494 33746 paths.hpp:233] CHECK_SOME(mkdir) failed: Failed 
> to create executor directory 
> '/var/lib/mesos/slaves/201303090208-1937777162-5050-38880-267/frameworks/201103282247-0000000019-0000/executors/thermos-1363210801777-mesos-meta_slave_0-27-e74e4b30-dcf1-4e88-8954-dd2b40b7dd89/runs/499fcc13-c391-421c-93d2-a56d1a4a931e':
>  Too many links
> *** Check failure stack trace: ***
>     @     0x7f9320f82f9d  google::LogMessage::Fail()
>     @     0x7f9320f88c07  google::LogMessage::SendToLog()
>     @     0x7f9320f8484c  google::LogMessage::Flush()
>     @     0x7f9320f84ab6  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f9320c70312  _CheckSome::~_CheckSome()
>     @     0x7f9320c9dd5c  
> mesos::internal::slave::paths::createExecutorDirectory()
>     @     0x7f9320c9e60d  mesos::internal::slave::Framework::createExecutor()
>     @     0x7f9320c7a7f7  mesos::internal::slave::Slave::runTask()
>     @     0x7f9320c9cb43  ProtobufProcess<>::handler4<>()
>     @     0x7f9320c8678b  std::tr1::_Function_handler<>::_M_invoke()
>     @     0x7f9320c9d1ab  ProtobufProcess<>::visit()
>     @     0x7f9320e4c774  process::MessageEvent::visit()
>     @     0x7f9320e40a1d  process::ProcessManager::resume()
>     @     0x7f9320e41268  process::schedule()
>     @     0x7f932055973d  start_thread
>     @     0x7f931ef3df6d  clone
> The fix here is to take into account the number of links (st_nlinks), when 
> determining whether we need to GC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to