[ 
https://issues.apache.org/jira/browse/MESOS-6575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16381694#comment-16381694
 ] 

Harold Dost III commented on MESOS-6575:
----------------------------------------

[~jamespeach]

So while looking at this ticket, I don't know if we'd want to break this down 
into multiple tickets, but here are my thoughts.

At the flag level to provide two settings.
 - {{xfs_use_disk_reservation_as_soft_limit}} - would be true/false (default: 
false) which would simply make the space reserved to be turned into a soft 
limit instead of a hard limit, which leads us to the next flag.
 - {{xfs_kill_on_soft_limit_violation}} - true/false (default:false) this way 
at a global level it can be configured so that once the grace period is over 
(configured by sysadmins with {{xfs_quota}}) it is killed.

With all of that being said, on a resource level, we could have two parameters:
- {{soft_disk_limit}} - This would override the flag 
{{xfs_use_disk_reservation_as_soft_limit}} instead such that if a soft limit is 
specified it provides exactly whatever space is desired for both.
- {{kill_on_soft_limit_violation}} - This would override the global flag 
{{xfs_kill_on_soft_limit_violation}} on a per task basis.

Optionally I was thinking that we could introduce another flag (not to make it 
even more complicated) which would be a default offset of soft limits. 
Something like {{xfs_kill_soft_quota_diff_bytes}} and it would be used to 
provide a global soft limit. This would also be overridden by 
{{soft_disk_limit}}, and would be ignored if 
{{xfs_use_disk_reservation_as_soft_limit}} is set. The idea behind the 
"diff_bytes" would be that you'd take the hard limit of any given task and 
subtract that amount of bytes to create a soft_limit below the hard limit.

> Change `disk/xfs` isolator to terminate executor when it exceeds quota
> ----------------------------------------------------------------------
>
>                 Key: MESOS-6575
>                 URL: https://issues.apache.org/jira/browse/MESOS-6575
>             Project: Mesos
>          Issue Type: Task
>          Components: agent, containerization
>            Reporter: Santhosh Kumar Shanmugham
>            Assignee: James Peach
>            Priority: Major
>
> Unlike {{disk/du}} isolator which sends a {{ContainerLimitation}} protobuf 
> when the executor exceeds the quota, {{disk/xfs}} isolator, which relies on 
> XFS's internal quota enforcement, silently fails the {{write}} operation, 
> that causes the quota limit to be exceeded, without surfacing the quota 
> breach information.
> This task is to change the `disk/xfs` isolator so that, a 
> {{ContainerLimitation}} message is triggered when the quota is exceeded. 
> This feature will rely on the underlying filesystem being mounted with 
> {{pqnoenforce}} (accounting-only mode), so that XFS does not silently causes 
> a {{EDQUOT}} error on writes that causes the quota to be exceeded. Now the 
> isolator can track the disk quota via {{xfs_quota}}, very much like 
> {{disk/du}} using {{du}}, every {{container_disk_watch_interval}} and surface 
> the disk quota limit exceed event via a {{ContainerLimitation}} protobuf, 
> causing the executor to be terminated. This feature can then be turned on/off 
> via the existing {{enforce_container_disk_quota}} option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to