If 'cleanup' is invoked twice, that might cause signal being sent twice? - Jie
On Tue, Jan 24, 2017 at 2:58 PM, <vinodk...@apache.org> wrote: > Repository: mesos > Updated Branches: > refs/heads/master 26923a1de -> 7f83b6e34 > > > Fixed bug allowing IOSwitchboard::connect() after container destruction. > > Review: https://reviews.apache.org/r/55810/ > > > Project: http://git-wip-us.apache.org/repos/asf/mesos/repo > Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/7f83b6e3 > Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/7f83b6e3 > Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/7f83b6e3 > > Branch: refs/heads/master > Commit: 7f83b6e34a2adef79a427e54f7a2dd4d5afbc1f5 > Parents: 26923a1 > Author: Kevin Klues <klue...@gmail.com> > Authored: Tue Jan 24 14:58:33 2017 -0800 > Committer: Vinod Kone <vinodk...@gmail.com> > Committed: Tue Jan 24 14:58:33 2017 -0800 > > ---------------------------------------------------------------------- > src/slave/containerizer/mesos/io/switchboard.cpp | 17 ++++++++++++++--- > 1 file changed, 14 insertions(+), 3 deletions(-) > ---------------------------------------------------------------------- > > > http://git-wip-us.apache.org/repos/asf/mesos/blob/7f83b6e3/ > src/slave/containerizer/mesos/io/switchboard.cpp > ---------------------------------------------------------------------- > diff --git a/src/slave/containerizer/mesos/io/switchboard.cpp > b/src/slave/containerizer/mesos/io/switchboard.cpp > index 4b134f8..1b8f490 100644 > --- a/src/slave/containerizer/mesos/io/switchboard.cpp > +++ b/src/slave/containerizer/mesos/io/switchboard.cpp > @@ -695,7 +695,7 @@ Future<http::Connection> IOSwitchboard::connect( > }) > .then(defer(self(), [=]() -> Future<http::Connection> { > if (!infos.contains(containerId)) { > - return Failure("Container has or is being destroyed"); > + return Failure("I/O switchboard has shutdown"); > } > > // TODO(jieyu): We might still get a connection refused error > @@ -758,8 +758,6 @@ Future<Nothing> IOSwitchboard::cleanup( > Option<pid_t> pid = infos[containerId]->pid; > Future<Option<int>> status = infos[containerId]->status; > > - infos.erase(containerId); > - > // If we have a pid, then we attempt to send it a SIGTERM to have it > // shutdown gracefully. This is best effort, as it's likely that the > // switchboard has already shutdown in the common case. > @@ -794,6 +792,19 @@ Future<Nothing> IOSwitchboard::cleanup( > // DISCARDED cases as well. > return await(list<Future<Option<int>>>{status}).then( > defer(self(), [this, containerId]() -> Future<Nothing> { > + // We only remove the 'containerId from our info struct once > + // we are sure that the I/O switchboard has shutdown. If we > + // removed it any earlier, attempts to connect to the I/O > + // switchboard would fail. > + // > + // NOTE: One caveat of this approach is that this lambda will > + // be invoked multiple times if `cleanup()` is called multiple > + // times before the first instance of it is triggered. This is > + // OK for now because the logic below has no side effects. If > + // the logic below gets more complicated, we may need to > + // revisit this approach. > + infos.erase(containerId); > + > // Best effort removal of the unix domain socket file created for > // this container's `IOSwitchboardServer`. If it hasn't been > // checkpointed yet, or the socket file itself hasn't been > created, > >