[
https://issues.apache.org/jira/browse/MESOS-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Lin Zhao updated MESOS-941:
---------------------------
Description:
When a framework is launched with memory resource only set on the tasks, and
non set on the executor level, the slave fails to apply the memory control
needed to limit memory usage for the executor. The executor process can use
more resident memory than specified in the tasks.
Example framework: https://gist.github.com/lin-zhao/8544495. This framework was
tested with Mesos 0.14.2 on Centos 6, kernel 3.10.11-1.el6.x86_64.
According to Benjamin Mahler:
What's happening is that you're launching an executor with no resources,
consequently before we fork, we attempt to update the memory control but we
don't call the memory handler since the executor has no memory resources:
I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default
(/home/lin/test-executor) in
/tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
with resources for framework 201401171812-2907575306-5050-19011-0020 in
cgroup
mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup controls
for executor default of framework 201401171812-2907575306-5050-19011-0020 with
resources
I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening for
OOM events for executor default of framework
201401171812-2907575306-5050-19011-0020
I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at = 27609
Then, later, when we are updating the resources for your 128MB task, we set the
soft limit, but we don't set the hard limit because the following buggy check
is not satisfied:
// Determine whether to set the hard limit. If this is the first
// time (info->pid.isNone()), or we're raising the existing limit,
// then we can update the hard limit safely. Otherwise, if we need
// to decrease 'memory.limit_in_bytes' we may induce an OOM if too
// much memory is in use. As a result, we only update the soft
// limit when the memory reservation is being reduced. This is
// probably okay if the machine has available resources.
// TODO(benh): Introduce a MemoryWatcherProcess which monitors the
// discrepancy between usage and soft limit and introduces a
// "manual oom" if necessary.
if (info->pid.isNone() || limit > currentLimit.get()) {
was:
When a framework is launched with memory resource only set on the tasks, and
non set on the executor level, the slave fails to apply the memory control
needed to limit memory usage for the executor. The executor process can use
more resident memory than specified in the tasks.
Example framework: https://gist.github.com/lin-zhao/8544495. This framework was
tested with Mesos 0.14.2 on Centos 5.
According to Benjamin Mahler:
What's happening is that you're launching an executor with no resources,
consequently before we fork, we attempt to update the memory control but we
don't call the memory handler since the executor has no memory resources:
I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default
(/home/lin/test-executor) in
/tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
with resources for framework 201401171812-2907575306-5050-19011-0020 in
cgroup
mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup controls
for executor default of framework 201401171812-2907575306-5050-19011-0020 with
resources
I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening for
OOM events for executor default of framework
201401171812-2907575306-5050-19011-0020
I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at = 27609
Then, later, when we are updating the resources for your 128MB task, we set the
soft limit, but we don't set the hard limit because the following buggy check
is not satisfied:
// Determine whether to set the hard limit. If this is the first
// time (info->pid.isNone()), or we're raising the existing limit,
// then we can update the hard limit safely. Otherwise, if we need
// to decrease 'memory.limit_in_bytes' we may induce an OOM if too
// much memory is in use. As a result, we only update the soft
// limit when the memory reservation is being reduced. This is
// probably okay if the machine has available resources.
// TODO(benh): Introduce a MemoryWatcherProcess which monitors the
// discrepancy between usage and soft limit and introduces a
// "manual oom" if necessary.
if (info->pid.isNone() || limit > currentLimit.get()) {
> Memory limit not correctly set when no memory resource set on executor level
> ----------------------------------------------------------------------------
>
> Key: MESOS-941
> URL: https://issues.apache.org/jira/browse/MESOS-941
> Project: Mesos
> Issue Type: Bug
> Components: slave
> Reporter: Lin Zhao
>
> When a framework is launched with memory resource only set on the tasks, and
> non set on the executor level, the slave fails to apply the memory control
> needed to limit memory usage for the executor. The executor process can use
> more resident memory than specified in the tasks.
> Example framework: https://gist.github.com/lin-zhao/8544495. This framework
> was tested with Mesos 0.14.2 on Centos 6, kernel 3.10.11-1.el6.x86_64.
> According to Benjamin Mahler:
> What's happening is that you're launching an executor with no resources,
> consequently before we fork, we attempt to update the memory control but we
> don't call the memory handler since the executor has no memory resources:
> I0121 19:39:01.660071 8566 cgroups_isolator.cpp:516] Launching default
> (/home/lin/test-executor) in
> /tmp/mesos/slaves/201312032357-3645772810-5050-2033-0/frameworks/201401171812-2907575306-5050-19011-0020/executors/default/runs/8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> with resources for framework 201401171812-2907575306-5050-19011-0020 in
> cgroup
> mesos/framework_201401171812-2907575306-5050-19011-0020_executor_default_tag_8bc2ab10-8988-4b22-afa2-3433bbedc3ed
> I0121 19:39:01.663082 8566 cgroups_isolator.cpp:709] Changing cgroup
> controls for executor default of framework
> 201401171812-2907575306-5050-19011-0020 with resources
> I0121 19:39:01.667129 8566 cgroups_isolator.cpp:1163] Started listening for
> OOM events for executor default of framework
> 201401171812-2907575306-5050-19011-0020
> I0121 19:39:01.681857 8566 cgroups_isolator.cpp:568] Forked executor at =
> 27609
> Then, later, when we are updating the resources for your 128MB task, we set
> the soft limit, but we don't set the hard limit because the following buggy
> check is not satisfied:
> // Determine whether to set the hard limit. If this is the first
> // time (info->pid.isNone()), or we're raising the existing limit,
> // then we can update the hard limit safely. Otherwise, if we need
> // to decrease 'memory.limit_in_bytes' we may induce an OOM if too
> // much memory is in use. As a result, we only update the soft
> // limit when the memory reservation is being reduced. This is
> // probably okay if the machine has available resources.
> // TODO(benh): Introduce a MemoryWatcherProcess which monitors the
> // discrepancy between usage and soft limit and introduces a
> // "manual oom" if necessary.
> if (info->pid.isNone() || limit > currentLimit.get()) {
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)