[
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16255758#comment-16255758
]
Pierre Cheynier commented on MESOS-6575:
----------------------------------------
We may also be interested in this feature.
Actually, XFS offer real enforcement and this is what's nice with it (avoid
someone to fallocate the whole disk).
But, a lot of applications are not developed to handle EDQUOT correctly (think
what happens on a non-containerized environment), or cannot react preventively
because they are not directly aware of what's happening (a companion process is
filling up the disk by writing logs, etc.). So it's better to actually kill the
task, like what's happening with oom-killer when using {{cgroups/memory}}.
So, our feeling is that we could leverage the XFS soft limit and eventually the
timer to introduce more modularity:
* it would have to be specified at the agent level that you want to enforce
(probably by reusing {{enforce_container_disk}} as suggested here)
* the soft limit would be customizable (ex: soft limit = hard limit - 2%)
* a collector would watch the container to eventually reach the soft limit and
eventually kill the container, like what cgroups/mem is performing indirectly
by relying on Linux oom-killer (and like what disk/du did for disk usage).
What do you think?
> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> ----------------------------------------------------------------------
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
> Issue Type: Task
> Components: agent, containerization
> Reporter: Santhosh Kumar Shanmugham
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on
> XFS's internal quota enforcement, silently fails the {{write}} operation,
> that causes the quota limit to be exceeded, without surfacing the quota
> breach information.
> This task is to change the `disk/xfs` isolator so that, a
> {{ContainerLimitation}} message is triggered when the quota is exceeded.
> This feature will rely on the underlying filesystem being mounted with
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the
> isolator can track the disk quota via {{xfs_quota}}, very much like
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf,
> causing the executor to be terminated. This feature can then be turned on/off
> via the existing {{enforce_container_disk_quota}} option.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)