[
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381694#comment-16381694
]
Harold Dost III commented on MESOS-6575:
----------------------------------------
[~jamespeach]
So while looking at this ticket, I don't know if we'd want to break this down
into multiple tickets, but here are my thoughts.
At the flag level to provide two settings.
- {{xfs_use_disk_reservation_as_soft_limit}} - would be true/false (default:
false) which would simply make the space reserved to be turned into a soft
limit instead of a hard limit, which leads us to the next flag.
- {{xfs_kill_on_soft_limit_violation}} - true/false (default:false) this way
at a global level it can be configured so that once the grace period is over
(configured by sysadmins with {{xfs_quota}}) it is killed.
With all of that being said, on a resource level, we could have two parameters:
- {{soft_disk_limit}} - This would override the flag
{{xfs_use_disk_reservation_as_soft_limit}} instead such that if a soft limit is
specified it provides exactly whatever space is desired for both.
- {{kill_on_soft_limit_violation}} - This would override the global flag
{{xfs_kill_on_soft_limit_violation}} on a per task basis.
Optionally I was thinking that we could introduce another flag (not to make it
even more complicated) which would be a default offset of soft limits.
Something like {{xfs_kill_soft_quota_diff_bytes}} and it would be used to
provide a global soft limit. This would also be overridden by
{{soft_disk_limit}}, and would be ignored if
{{xfs_use_disk_reservation_as_soft_limit}} is set. The idea behind the
"diff_bytes" would be that you'd take the hard limit of any given task and
subtract that amount of bytes to create a soft_limit below the hard limit.
> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> ----------------------------------------------------------------------
>
> Key: MESOS-6575
> URL: https://issues.apache.org/jira/browse/MESOS-6575
> Project: Mesos
> Issue Type: Task
> Components: agent, containerization
> Reporter: Santhosh Kumar Shanmugham
> Assignee: James Peach
> Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on
> XFS's internal quota enforcement, silently fails the {{write}} operation,
> that causes the quota limit to be exceeded, without surfacing the quota
> breach information.
> This task is to change the `disk/xfs` isolator so that, a
> {{ContainerLimitation}} message is triggered when the quota is exceeded.
> This feature will rely on the underlying filesystem being mounted with
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the
> isolator can track the disk quota via {{xfs_quota}}, very much like
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf,
> causing the executor to be terminated. This feature can then be turned on/off
> via the existing {{enforce_container_disk_quota}} option.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)