[
https://issues.apache.org/jira/browse/MESOS-391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14348941#comment-14348941
]
Ritwik Yadav commented on MESOS-391:
------------------------------------
Failed attempt at solutions and some questions:
I have tried a couple of solutions which didn’t get accepted. I have described
them in chronological order:
1. Instead of scheduling every directory for garbage collection after
‘flags.gc_delay’ seconds from its last modification time, I suggested that we
‘immediately’ remove the ones which had a high subdirectory count or otherwise
make this delay a function of the number of links. This solution was rejected
because I was changing the order in which the directories got garbage collected.
2. Another solution that I tried was to modify the separate thread that
monitors disk usage to also monitor the number of executor directories in all
frameworks and speed up the garbage collection process for all directories
(scheduled to be garbage collected within the next ‘t’ seconds) if any one
framework had high number of executor directories, where,
t = function (disk_usage_by_slave,
highest_number_of_executor_directories_in_any_active_framework)
This solution was criticized because excessive number of executor directories
in one framework affects garbage collection in other frameworks.
Later on, I realized that this solution was incorrect as this would not
necessarily remove the directory with the highest number of subdirectories.
I also need to modify the slave::garbageCollect function to schedule each
directory after ‘d’ seconds. In the present implementation ‘d’ is computed so
that the garbage collection takes place after ‘flags.gc_delay’ seconds after
the last modification time. I propose that we make ‘d’ a function of the number
of subdirectories present in that directory. As per my understanding finding
this function is the key to solving this issue. Also, the form of the function
needs to be thought of:
a) d = function (last_modification_time, number_of_subdirectories)
or,
b) d = function (number_of_subdirectories)
With this, I would like to draw attention to some of the fundamental questions
I had:
1. What is the major reason behind scheduling every directory after
flags.gc_delay seconds after its last modification time?
2. Is the present order in which garbage collection is done important? Say for
example, If I schedule two directories X and Y and X is scheduled after t1
seconds and Y is scheduled after t2 seconds. Does the premise t1 < t2, strictly
mandate that X be always deleted before Y?
3. This problem was encountered when creating more ‘executor’ directories so
there is evidence that the number of executors may exceed _PC_LINK_MAX. What
about the number of executor runs? Are we certain that the number of executor
runs would always be less than _PC_LINK_MAX?
> Slave GarbageCollector needs to also take into account the number of links,
> when determining removal time.
> ----------------------------------------------------------------------------------------------------------
>
> Key: MESOS-391
> URL: https://issues.apache.org/jira/browse/MESOS-391
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
> Assignee: Ritwik Yadav
> Labels: twitter
>
> The slave garbage collector does not take into account the number of links
> present, which means that if we create a lot of executor directories (up to
> LINK_MAX), we won't necessarily GC.
> As a result of this, the slave crashes:
> F0313 21:40:02.926494 33746 paths.hpp:233] CHECK_SOME(mkdir) failed: Failed
> to create executor directory
> '/var/lib/mesos/slaves/201303090208-1937777162-5050-38880-267/frameworks/201103282247-0000000019-0000/executors/thermos-1363210801777-mesos-meta_slave_0-27-e74e4b30-dcf1-4e88-8954-dd2b40b7dd89/runs/499fcc13-c391-421c-93d2-a56d1a4a931e':
> Too many links
> *** Check failure stack trace: ***
> @ 0x7f9320f82f9d google::LogMessage::Fail()
> @ 0x7f9320f88c07 google::LogMessage::SendToLog()
> @ 0x7f9320f8484c google::LogMessage::Flush()
> @ 0x7f9320f84ab6 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f9320c70312 _CheckSome::~_CheckSome()
> @ 0x7f9320c9dd5c
> mesos::internal::slave::paths::createExecutorDirectory()
> @ 0x7f9320c9e60d mesos::internal::slave::Framework::createExecutor()
> @ 0x7f9320c7a7f7 mesos::internal::slave::Slave::runTask()
> @ 0x7f9320c9cb43 ProtobufProcess<>::handler4<>()
> @ 0x7f9320c8678b std::tr1::_Function_handler<>::_M_invoke()
> @ 0x7f9320c9d1ab ProtobufProcess<>::visit()
> @ 0x7f9320e4c774 process::MessageEvent::visit()
> @ 0x7f9320e40a1d process::ProcessManager::resume()
> @ 0x7f9320e41268 process::schedule()
> @ 0x7f932055973d start_thread
> @ 0x7f931ef3df6d clone
> The fix here is to take into account the number of links (st_nlinks), when
> determining whether we need to GC.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)