Lin Zhao created MESOS-941:
------------------------------

             Summary: CPU limit not correctly set when executor launched with 
no memory resource
                 Key: MESOS-941
                 URL: https://issues.apache.org/jira/browse/MESOS-941
             Project: Mesos
          Issue Type: Bug
          Components: slave
            Reporter: Lin Zhao


When a framework is launched with memory resource only set on the tasks, and 
non set on the executor level, the slave fails to apply the memory control 
needed to limit memory usage for the executor. The executor process can use 
more resident memory than specified in the tasks.

Example framework: https://gist.github.com/lin-zhao/8544495. This framework was 
tested with Mesos 0.14.2 on Centos 5. 

According to Benjamin Mahler:

What's happening is that you're launching an executor with no resources, 
consequently before we fork, we attempt to update the memory control but we 
don't call the memory handler since the executor has no memory resources:

I0121 19:39:01.660071  8566 cgroups_isolator.cpp:516] Launching default 
(/home/lin/test-executor) in 
/tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
 with resources  for framework 201401171812-2907575306-5050-19011-0020 in 
cgroup 
mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
I0121 19:39:01.663082  8566 cgroups_isolator.cpp:709] Changing cgroup controls 
for executor default of framework 201401171812-2907575306-5050-19011-0020 with 
resources 
I0121 19:39:01.667129  8566 cgroups_isolator.cpp:1163] Started listening for 
OOM events for executor default of framework 
201401171812-2907575306-5050-19011-0020
I0121 19:39:01.681857  8566 cgroups_isolator.cpp:568] Forked executor at = 27609

Then, later, when we are updating the resources for your 128MB task, we set the 
soft limit, but we don't set the hard limit because the following buggy check 
is not satisfied:

  // Determine whether to set the hard limit. If this is the first
  // time (info->pid.isNone()), or we're raising the existing limit,
  // then we can update the hard limit safely. Otherwise, if we need
  // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
  // much memory is in use. As a result, we only update the soft
  // limit when the memory reservation is being reduced. This is
  // probably okay if the machine has available resources.
  // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
  // discrepancy between usage and soft limit and introduces a
  // "manual oom" if necessary.
  if (info->pid.isNone() || limit > currentLimit.get()) {



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to