[ 
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388782#comment-15388782
 ] 

Silas Snider commented on MESOS-5879:
-------------------------------------

Oh, sorry, I should have mentioned I was ruling out the agent because we have 
another cluster where we run the exact same agent config, except without the 
custom isolator (and machinery that goes along with it), and this problem has 
never occurred there, despite equivalent usage and load.

> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
>                 Key: MESOS-5879
>                 URL: https://issues.apache.org/jira/browse/MESOS-5879
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups, isolation, slave
>            Reporter: Silas Snider
>            Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any 
> agent process in a cluster running an experimental custom isolator as well, 
> the agents are unable to recover from checkpoint, because net_cls reports 
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our 
> custom isolator), it's also a problem that the net_cls isolator fails 
> recovery just for duplicate handles in cgroups that it is literally about to 
> unconditionally destroy during recovery. Can this be fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to