[ 
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388699#comment-15388699
 ] 

Silas Snider commented on MESOS-5879:
-------------------------------------

I cannot dump the cgroups -- they no longer exist because we disabled this 
feature and cleaned them up to get prod working. I can definitely not dump the 
agent logs, sorry.

However, this is all part of an investigation I'm doing internally -- we don't 
suspect that the mesos agent is causing the duplicate handle issue, we think 
it's probably something with our custom isolator or similar. This JIRA is to 
track the effort around not aborting recovery due to this sort of issue.

> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
>                 Key: MESOS-5879
>                 URL: https://issues.apache.org/jira/browse/MESOS-5879
>             Project: Mesos
>          Issue Type: Bug
>          Components: cgroups, isolation, slave
>            Reporter: Silas Snider
>            Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any 
> agent process in a cluster running an experimental custom isolator as well, 
> the agents are unable to recover from checkpoint, because net_cls reports 
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our 
> custom isolator), it's also a problem that the net_cls isolator fails 
> recovery just for duplicate handles in cgroups that it is literally about to 
> unconditionally destroy during recovery. Can this be fixed?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to