[ https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15558382#comment-15558382 ]
Avinash Sridharan commented on MESOS-5879: ------------------------------------------ Great !! I would still keep this open. Will just mark this as "is blocked by MESOS-6035" and close this once MESOS-6035 is fixed. This is not exactly a duplicate, but a side affect of the recursive cgroups::get. > cgroups/net_cls isolator causing agent recovery issues > ------------------------------------------------------ > > Key: MESOS-5879 > URL: https://issues.apache.org/jira/browse/MESOS-5879 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation, slave > Reporter: Silas Snider > Assignee: Avinash Sridharan > Labels: mesosphere > > We run with 'cgroups/net_cls' in our isolator list, and when we restart any > agent process in a cluster running an experimental custom isolator as well, > the agents are unable to recover from checkpoint, because net_cls reports > that unknown orphan containers have duplicate net_cls handles. > While this is a problem that needs to be solved (probably by fixing our > custom isolator), it's also a problem that the net_cls isolator fails > recovery just for duplicate handles in cgroups that it is literally about to > unconditionally destroy during recovery. Can this be fixed? -- This message was sent by Atlassian JIRA (v6.3.4#6332)