[
https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388397#comment-15388397
]
Avinash Sridharan commented on MESOS-5879:
------------------------------------------
Could you clarify if the custom isolator you are testing also trying to
manipulate the net_cls handles? Or for that matter some other entity in the
environment? If that is the case that is a bigger problem. I am not comfortable
with the fact that given that there is a misallocation of handles we are
allowing the isolator to proceed. This can have unexpected consequences, which
is why the isolator is reporting an error and bailing out rather than trying to
live with the problem.
> cgroups/net_cls isolator causing agent recovery issues
> ------------------------------------------------------
>
> Key: MESOS-5879
> URL: https://issues.apache.org/jira/browse/MESOS-5879
> Project: Mesos
> Issue Type: Bug
> Components: cgroups, isolation, slave
> Reporter: Silas Snider
> Assignee: Avinash Sridharan
>
> We run with 'cgroups/net_cls' in our isolator list, and when we restart any
> agent process in a cluster running an experimental custom isolator as well,
> the agents are unable to recover from checkpoint, because net_cls reports
> that unknown orphan containers have duplicate net_cls handles.
> While this is a problem that needs to be solved (probably by fixing our
> custom isolator), it's also a problem that the net_cls isolator fails
> recovery just for duplicate handles in cgroups that it is literally about to
> unconditionally destroy during recovery. Can this be fixed?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)