[ 
https://issues.apache.org/jira/browse/MESOS-1588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14265021#comment-14265021
 ] 

Jie Yu edited comment on MESOS-1588 at 1/5/15 8:25 PM:
-------------------------------------------------------

Seems that this becomes quite important once we start to support persistent 
disks (MESOS-1554). In the pre persistent disk world, task's sandbox will be 
GCed after the task terminates, therefore, disk quota enforcement is not an 
urgent issue (even if a task uses more disk than it requested, the 
overcommitted disk resources will be reclaimed once it terminates).

However, with persistent disks, this feature becomes necessary because 
persistent disk will not be auto-GCed. If a task writes more data to its 
persistent disk, the slave will slowly run out of disk space while the 
master/allocator still thinks that there are disk space available on the slave.

There are multiple ways to achieve disk quota enforcement in Mesos. The ideal 
solution is to construct file systems for each disk resource so that quota can 
be enforced by the file system. For example, a task's sandbox is actually a 
file system created from either a raw disk device, an LVM volume, or a file in 
the root file system. The task will receive an ENOSPC when it tries to write 
more data than requested.

However, the ideal solution either assumes something that's not available on 
all platforms (raw device, lvm), or have unknown performance characteristics 
(filesystem on top of a filesystem). I am going to propose an intermediate 
solution here which is less intrusive and fits in our current code base quite 
well.

How about adding a new Isolator in MesosContainerizer called DiskQuotaIsolator. 
It periodically scans the disks (sandbox and persistent disks) using du and 
reports a Limitation (like CgroupsMemIsolator) once a container uses more disk 
than requested. The frequency and pace of du should be limited so that it does 
not cause too much interferences to the running tasks.

As you can see, this is not a strict enforcement because a task can still go 
over its disk space limit. Let's call it soft enforcement. Hopefully, we can 
tune the du frequency so that a task cannot exceed its disk space limit too 
much.

Another interesting thing to discuss here is that what if a persistent disk 
goes over its limit? What will happen is the container having the persistent 
disk will get killed once the du detects that it's over limit. Now, what should 
the user do if he wants to recover the data in the persistent disk? He cannot 
launch a task to recover the data because it will get killed. And currently, we 
do not support re-sizing persistent disks. That's a problem with any soft 
enforcement solution.


was (Author: jieyu):
Seems that this becomes quite important once we start to support persistent 
disks (MESOS-1554). In the pre persistent disk world, task's sandbox will be 
GCed after the task terminates, therefore, disk quota enforcement is an urgent 
issue (even if a task uses more disk than it requested, the overcommitted disk 
resources will be reclaimed once it terminates).

However, with persistent disks, this feature becomes necessary because 
persistent disk will not be auto-GCed. If a task writes more data to its 
persistent disk, the slave will slowly run out of disk space while the 
master/allocator still thinks that there are disk space available on the slave.

There are multiple ways to achieve disk quota enforcement in Mesos. The ideal 
solution is to construct file systems for each disk resource so that quota can 
be enforced by the file system. For example, a task's sandbox is actually a 
file system created from either a raw disk device, an LVM volume, or a file in 
the root file system. The task will receive an ENOSPC when it tries to write 
more data than requested.

However, the ideal solution either assumes something that's not available on 
all platforms (raw device, lvm), or have unknown performance characteristics 
(filesystem on top of a filesystem). I am going to propose an intermediate 
solution here which is less intrusive and fits in our current code base quite 
well.

How about adding a new Isolator in MesosContainerizer called DiskQuotaIsolator. 
It periodically scans the disks (sandbox and persistent disks) using du and 
reports a Limitation (like CgroupsMemIsolator) once a container uses more disk 
than requested. The frequency and pace of du should be limited so that it does 
not cause too much interferences to the running tasks.

As you can see, this is not a strict enforcement because a task can still go 
over its disk space limit. Let's call it soft enforcement. Hopefully, we can 
tune the du frequency so that a task cannot exceed its disk space limit too 
much.

Another interesting thing to discuss here is that what if a persistent disk 
goes over its limit? What will happen is the container having the persistent 
disk will get killed once the du detects that it's over limit. Now, what should 
the user do if he wants to recover the data in the persistent disk? He cannot 
launch a task to recover the data because it will get killed. And currently, we 
do not support re-sizing persistent disks. That's a problem with any soft 
enforcement solution.

> Enforce disk quota in MesosContainerizer
> ----------------------------------------
>
>                 Key: MESOS-1588
>                 URL: https://issues.apache.org/jira/browse/MESOS-1588
>             Project: Mesos
>          Issue Type: Improvement
>    Affects Versions: 0.20.0
>            Reporter: Ian Downes
>            Assignee: Ian Downes
>
> Once we have disk usage we should enforce this. Containers that exceed their 
> quota should be terminated, i.e., the filesystem isolator should set a 
> Limitation so the MesosContainerizer kills the container.
> Disk quota enforcement should be optional to permit a transition period where 
> disk usage is monitored before enabling enforcement.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to