Re: Review Request 21424: Fixed orphaned container handling in the ExternalContainerizer recover implementation.

Till Toenshoff Mon, 19 May 2014 15:44:18 -0700


> On May 16, 2014, 7:19 p.m., Ben Mahler wrote:
> > src/slave/containerizer/external_containerizer.cpp, line 836
> > <https://reviews.apache.org/r/21424/diff/2/?file=583558#file583558line836>
> >
> >     What does according to the internal states mean?
> 
> Till Toenshoff wrote:
>     It means that the EC does not have any knowledge on such container ever 
> being launched - usually an indicator for an orphan, as discussed before.
>     
>     Your comment hints that I should rephrase.
>     
>     How about:
>     LOG(WARNING) << "Container '" << containerId << "' not running according 
> to the slave states."; ?
> 
> Ben Mahler wrote:
>     Why are we logging the same thing inside destroy, _destroy, and 
> __destroy? The current log messages here seem like they may be a bit 
> confusing for those reading the logs.
>     
>     We should have some logging that says what we're doing. In 'destroy', we 
> could either say:
>     
>     INFO: "Destroying container ID", or
>     INFO: "Attempting to destroy unknown container ID"
>     
>     We can make the second case a WARNING but it's expected that we'll 
> encounter orphans.
>     
>     In '__destroy', we could say say:
>     
>     INFO: "Destroyed container ID"
>     
>     What happens when in between 'destroy' and '_destroy', the 
> 'launched->future()' never completes? Should there be a timeout to proceed 
> with destruction?

While getting the tests working, I decided to give this a makeover. So instead 
of allowing destroy-invocations on non existent "actives", I am now generating 
a partial "Container" state that prevents all those confusing logs. Just as you 
said, orphans are expected and their handling is getting properly tested in the 
SlaveRecoveryTests.

Now referring to your comment on the launched future chaining; adding a timeout 
might indeed be a good idea. Especially considering that all external calls 
(not just destroy) are chained onto that future. I will open up a JIRA ticket 
for such improvement for the EC. Thanks Ben!

For the logging in general, I was trying to minimize the INFO logging. Right 
now, it should be close to what the MesosContainerizer is doing. For better 
debugging purposes, I have additionally been using VLOG(1) and VLOG(2). VLOG(1) 
is meant to be a reflection on the current actions (code path). VLOG(2) is 
meant to be an indicator of states. Even though I added a comment on this 
within external_containerizer.hpp, it might actually be a good idea to 
reiterate and increase the INFO-log-output of the EC. Will start a little 
discussion on that as well.

- Till

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21424/#review43244
-----------------------------------------------------------

On May 16, 2014, 7:20 p.m., Till Toenshoff wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21424/
> -----------------------------------------------------------
> 
> (Updated May 16, 2014, 7:20 p.m.)
> 
> 
> Review request for mesos and Niklas Nielsen.
> 
> 
> Bugs: MESOS-1364
>     https://issues.apache.org/jira/browse/MESOS-1364
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> An orphaned container is known to the ECP but not to the EC, thus not 
> recoverable but pending. This patch enforces a call to destroy for any orphan 
> that has been identified as such during the recovery phase.
> 
> 
> Diffs
> -----
> 
>   src/slave/containerizer/external_containerizer.hpp afffff1 
>   src/slave/containerizer/external_containerizer.cpp 2ff19b1 
> 
> Diff: https://reviews.apache.org/r/21424/diff/
> 
> 
> Testing
> -------
> 
> make check (note that the tests currently do not cover this scenario)
> 
> 
> Thanks,
> 
> Till Toenshoff
> 
>

Re: Review Request 21424: Fixed orphaned container handling in the ExternalContainerizer recover implementation.

Reply via email to