[
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390889#comment-15390889
]
Silas Snider commented on MESOS-5879:
-------------------------------------
Our creation of net_cls sub-cgroups is an artifact of a 3rd-party piece of
software unilaterally creating all its cgroups underneath the mesos ones in
every subsystem.
Even despite this, Mesos can only practically guarantee that cgroups created by
Mesos don't have duplicate handles -- it's totally possible for someone to say,
'docker run' and reuse a net_cls handle in the /docker hierarchy that mesos
won't even check. It's therefore questionable that the net_cls isolator should
ever descend into child cgroups.
> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
> Key: MESOS-5879
> URL: https://issues.apache.org/jira/browse/MESOS-5879
> Project: Mesos
> Issue Type: Bug
> Components: cgroups, isolation, slave
> Reporter: Silas Snider
> Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any
> agent process in a cluster running an experimental custom isolator as well,
> the agents are unable to recover from checkpoint, because net_cls reports
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our
> custom isolator), it's also a problem that the net_cls isolator fails
> recovery just for duplicate handles in cgroups that it is literally about to
> unconditionally destroy during recovery. Can this be fixed?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)