[ https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381694#comment-16381694 ]
Harold Dost III commented on MESOS-6575: ---------------------------------------- [~jamespeach] So while looking at this ticket, I don't know if we'd want to break this down into multiple tickets, but here are my thoughts. At the flag level to provide two settings. - {{xfs_use_disk_reservation_as_soft_limit}} - would be true/false (default: false) which would simply make the space reserved to be turned into a soft limit instead of a hard limit, which leads us to the next flag. - {{xfs_kill_on_soft_limit_violation}} - true/false (default:false) this way at a global level it can be configured so that once the grace period is over (configured by sysadmins with {{xfs_quota}}) it is killed. With all of that being said, on a resource level, we could have two parameters: - {{soft_disk_limit}} - This would override the flag {{xfs_use_disk_reservation_as_soft_limit}} instead such that if a soft limit is specified it provides exactly whatever space is desired for both. - {{kill_on_soft_limit_violation}} - This would override the global flag {{xfs_kill_on_soft_limit_violation}} on a per task basis. Optionally I was thinking that we could introduce another flag (not to make it even more complicated) which would be a default offset of soft limits. Something like {{xfs_kill_soft_quota_diff_bytes}} and it would be used to provide a global soft limit. This would also be overridden by {{soft_disk_limit}}, and would be ignored if {{xfs_use_disk_reservation_as_soft_limit}} is set. The idea behind the "diff_bytes" would be that you'd take the hard limit of any given task and subtract that amount of bytes to create a soft_limit below the hard limit. > Change `disk/xfs` isolator to terminate executor when it exceeds quota > ---------------------------------------------------------------------- > > Key: MESOS-6575 > URL: https://issues.apache.org/jira/browse/MESOS-6575 > Project: Mesos > Issue Type: Task > Components: agent, containerization > Reporter: Santhosh Kumar Shanmugham > Assignee: James Peach > Priority: Major > > Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf > when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on > XFS's internal quota enforcement, silently fails the {{write}} operation, > that causes the quota limit to be exceeded, without surfacing the quota > breach information. > This task is to change the `disk/xfs` isolator so that, a > {{ContainerLimitation}} message is triggered when the quota is exceeded. > This feature will rely on the underlying filesystem being mounted with > {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes > a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the > isolator can track the disk quota via {{xfs_quota}}, very much like > {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface > the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, > causing the executor to be terminated. This feature can then be turned on/off > via the existing {{enforce_container_disk_quota}} option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)