[ https://issues.apache.org/jira/browse/MESOS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jie Yu updated MESOS-1649: -------------------------- Story Points: 3 > Network isolator should tolerate slave crashes while doing isolate/cleanup. > --------------------------------------------------------------------------- > > Key: MESOS-1649 > URL: https://issues.apache.org/jira/browse/MESOS-1649 > Project: Mesos > Issue Type: Bug > Reporter: Jie Yu > Assignee: Jie Yu > Fix For: 0.20.0 > > > A slave may crash while we are installing/removing filters. The slave > recovery for the network isolator should tolerate those partially installed > filters. Also, we want to avoid leaking a filter on host eth0 and host lo. > The current code cannot tolerate that, thus may cause the following error: > {noformat} > Failed to perform recovery: Collect failed: Failed to recover container > d409a100-2afb-497c-864f-fe3002cf65d9 with pid 50405: No ephemeral ports found > To remedy this do as follows: > Step 1: rm -f /var/lib/mesos/meta/slaves/latest > This ensures slave doesn't recover old live executors. > Step 2: Restart the slave. > {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)