[ 
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392348#comment-15392348
 ] 

Avinash Sridharan commented on MESOS-5879:
------------------------------------------

Hi Silas,
 Had a discussion on this with [~jieyu]. I agree that the net_cls isolator 
should not be descending into the child cgroups looking for net_cls handles and 
definitely a bug that we should fix. We can use this JIRA to fix that issue.

As far as reuse by net_cls handle with other orchestrators (such as docker) in 
a different hierarchy is concerned, the expectation is that the operator is 
responsible for slicing and dicing the ranges between different orchestrator 
entities by specifying the primary handles and the secondary handle range.

> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
>                 Key: MESOS-5879
>                 URL: https://issues.apache.org/jira/browse/MESOS-5879
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups, isolation, slave
>            Reporter: Silas Snider
>            Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any 
> agent process in a cluster running an experimental custom isolator as well, 
> the agents are unable to recover from checkpoint, because net_cls reports 
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our 
> custom isolator), it's also a problem that the net_cls isolator fails 
> recovery just for duplicate handles in cgroups that it is literally about to 
> unconditionally destroy during recovery. Can this be fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to