[
https://issues.apache.org/jira/browse/MESOS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jie Yu updated MESOS-1649:
--------------------------
Story Points: 3
> Network isolator should tolerate slave crashes while doing isolate/cleanup.
> ---------------------------------------------------------------------------
>
> Key: MESOS-1649
> URL: https://issues.apache.org/jira/browse/MESOS-1649
> Project: Mesos
> Issue Type: Bug
> Reporter: Jie Yu
> Assignee: Jie Yu
> Fix For: 0.20.0
>
>
> A slave may crash while we are installing/removing filters. The slave
> recovery for the network isolator should tolerate those partially installed
> filters. Also, we want to avoid leaking a filter on host eth0 and host lo.
> The current code cannot tolerate that, thus may cause the following error:
> {noformat}
> Failed to perform recovery: Collect failed: Failed to recover container
> d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found
> To remedy this do as follows:
> Step 1: rm -f /var/lib/mesos/meta/slaves/latest
> This ensures slave doesn't recover old live executors.
> Step 2: Restart the slave.
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.2#6252)