[
https://issues.apache.org/jira/browse/MESOS-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265021#comment-14265021
]
Jie Yu edited comment on MESOS-1588 at 1/5/15 8:25 PM:
-------------------------------------------------------
Seems that this becomes quite important once we start to support persistent
disks (MESOS-1554). In the pre persistent disk world, task's sandbox will be
GCed after the task terminates, therefore, disk quota enforcement is not an
urgent issue (even if a task uses more disk than it requested, the
overcommitted disk resources will be reclaimed once it terminates).
However, with persistent disks, this feature becomes necessary because
persistent disk will not be auto-GCed. If a task writes more data to its
persistent disk, the slave will slowly run out of disk space while the
master/allocator still thinks that there are disk space available on the slave.
There are multiple ways to achieve disk quota enforcement in Mesos. The ideal
solution is to construct file systems for each disk resource so that quota can
be enforced by the file system. For example, a task's sandbox is actually a
file system created from either a raw disk device, an LVM volume, or a file in
the root file system. The task will receive an ENOSPC when it tries to write
more data than requested.
However, the ideal solution either assumes something that's not available on
all platforms (raw device, lvm), or have unknown performance characteristics
(filesystem on top of a filesystem). I am going to propose an intermediate
solution here which is less intrusive and fits in our current code base quite
well.
How about adding a new Isolator in MesosContainerizer called DiskQuotaIsolator.
It periodically scans the disks (sandbox and persistent disks) using du and
reports a Limitation (like CgroupsMemIsolator) once a container uses more disk
than requested. The frequency and pace of du should be limited so that it does
not cause too much interferences to the running tasks.
As you can see, this is not a strict enforcement because a task can still go
over its disk space limit. Let's call it soft enforcement. Hopefully, we can
tune the du frequency so that a task cannot exceed its disk space limit too
much.
Another interesting thing to discuss here is that what if a persistent disk
goes over its limit? What will happen is the container having the persistent
disk will get killed once the du detects that it's over limit. Now, what should
the user do if he wants to recover the data in the persistent disk? He cannot
launch a task to recover the data because it will get killed. And currently, we
do not support re-sizing persistent disks. That's a problem with any soft
enforcement solution.
was (Author: jieyu):
Seems that this becomes quite important once we start to support persistent
disks (MESOS-1554). In the pre persistent disk world, task's sandbox will be
GCed after the task terminates, therefore, disk quota enforcement is an urgent
issue (even if a task uses more disk than it requested, the overcommitted disk
resources will be reclaimed once it terminates).
However, with persistent disks, this feature becomes necessary because
persistent disk will not be auto-GCed. If a task writes more data to its
persistent disk, the slave will slowly run out of disk space while the
master/allocator still thinks that there are disk space available on the slave.
There are multiple ways to achieve disk quota enforcement in Mesos. The ideal
solution is to construct file systems for each disk resource so that quota can
be enforced by the file system. For example, a task's sandbox is actually a
file system created from either a raw disk device, an LVM volume, or a file in
the root file system. The task will receive an ENOSPC when it tries to write
more data than requested.
However, the ideal solution either assumes something that's not available on
all platforms (raw device, lvm), or have unknown performance characteristics
(filesystem on top of a filesystem). I am going to propose an intermediate
solution here which is less intrusive and fits in our current code base quite
well.
How about adding a new Isolator in MesosContainerizer called DiskQuotaIsolator.
It periodically scans the disks (sandbox and persistent disks) using du and
reports a Limitation (like CgroupsMemIsolator) once a container uses more disk
than requested. The frequency and pace of du should be limited so that it does
not cause too much interferences to the running tasks.
As you can see, this is not a strict enforcement because a task can still go
over its disk space limit. Let's call it soft enforcement. Hopefully, we can
tune the du frequency so that a task cannot exceed its disk space limit too
much.
Another interesting thing to discuss here is that what if a persistent disk
goes over its limit? What will happen is the container having the persistent
disk will get killed once the du detects that it's over limit. Now, what should
the user do if he wants to recover the data in the persistent disk? He cannot
launch a task to recover the data because it will get killed. And currently, we
do not support re-sizing persistent disks. That's a problem with any soft
enforcement solution.
> Enforce disk quota in MesosContainerizer
> ----------------------------------------
>
> Key: MESOS-1588
> URL: https://issues.apache.org/jira/browse/MESOS-1588
> Project: Mesos
> Issue Type: Improvement
> Affects Versions: 0.20.0
> Reporter: Ian Downes
> Assignee: Ian Downes
>
> Once we have disk usage we should enforce this. Containers that exceed their
> quota should be terminated, i.e., the filesystem isolator should set a
> Limitation so the MesosContainerizer kills the container.
> Disk quota enforcement should be optional to permit a transition period where
> disk usage is monitored before enabling enforcement.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)