[ https://issues.apache.org/jira/browse/MESOS-5879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15388520#comment-15388520 ]
Silas Snider commented on MESOS-5879: ------------------------------------- Yes, it's remaining constant. These are the flags: ``` /usr/sbin/mesos-slave \ --modules=file:///etc/mesos/mslave-modules.conf \ --hooks=customHook \ --master=... \ --port=5051 \ --isolation=cgroups/net_cls,cgroups/cpu,cgroups/mem,posix/disk,customIsolator \ --enforce_container_disk_quota \ --attributes=file:///etc/mesos/attributes.conf \ --executor_registration_timeout=30mins \ --executor_shutdown_grace_period=30mins \ --recovery_timeout=60mins \ --cgroups_hierarchy=/cgroup \ --work_dir=/srv/mesos/work \ --containerizers=mesos \ --hadoop_home=/usr \ --cgroups_limit_swap \ --credential=/etc/mesos/credential \ --slave_subsystems=cpu,memory,net_cls \ --cgroups_net_cls_primary_handle=0x1111 \ --cgroups_net_cls_secondary_handles=0x0001,0xFFFF \ ``` > cgroups/net_cls isolator causing agent recovery issues > ------------------------------------------------------ > > Key: MESOS-5879 > URL: https://issues.apache.org/jira/browse/MESOS-5879 > Project: Mesos > Issue Type: Bug > Components: cgroups, isolation, slave > Reporter: Silas Snider > Assignee: Avinash Sridharan > > We run with 'cgroups/net_cls' in our isolator list, and when we restart any > agent process in a cluster running an experimental custom isolator as well, > the agents are unable to recover from checkpoint, because net_cls reports > that unknown orphan containers have duplicate net_cls handles. > While this is a problem that needs to be solved (probably by fixing our > custom isolator), it's also a problem that the net_cls isolator fails > recovery just for duplicate handles in cgroups that it is literally about to > unconditionally destroy during recovery. Can this be fixed? -- This message was sent by Atlassian JIRA (v6.3.4#6332)