Re: Review Request: Terminate executors that aren't needed.

Brenden Matthews Mon, 06 May 2013 12:04:08 -0700


> On May 3, 2013, 7:37 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 1085-1089
> > <https://reviews.apache.org/r/10932/diff/1/?file=287641#file287641line1085>
> >
> >     this seems like the wrong thing to do. an executor can run more than 
> > one task. why do you want to kill the executor if it could get more tasks?
> 
> Brenden Matthews wrote:
>     I think this is a bug.
>     
>     I've had many cases where the executor launches, starts a task, and the 
> task is killed before it has finished launching.  This results in the task 
> continuing to run indefinitely or until the mesos slave process is restarted.
> 
> Vinod Kone wrote:
>     I don't think I follow the sequence of events here. Is it as follows?
>     
>     --> Slave launches an executor
>     --> Before the executor registers with the slave, the framework asks to 
> kill a task (do you know why?)
>     --> When the executor registers it doesn't get any task from the slave
>     --> The executor is running without any task.
>     
>     I don't understand what do you mean by "task continuing to run 
> indefinitely". Do you mean "the executor runs indefinitely"? If its the 
> latter, it seems the right semantics for a general purpose executor. Am I 
> missing something?
> 
> Brenden Matthews wrote:
>     I'm sorry, I realize that wasn't very clear.  I went digging for logs but 
> I can't find an example (it seems to have been all rotated out).
>     
>     And yes, that sounds correct.
>     
>     I'm not actually sure what the cause of this is.  The Hadoop scheduler 
> will occasionally kill tasks, so it could be that (but I haven't scoured the 
> logs to determine the cause).
> 
> Ben Mahler wrote:
>     I agree with Brenden here that this is unexpected. Currently, all 
> executors have to handle the case where they start and _never_ receive a 
> launchTask. That seems broken to me, since the expectation is that we've 
> launched the executor in order to launch a task in the first place.
>     
>     After talking with Vinod I think there are two ways to "fix" this:
>     
>     1) In the ExecutorDriver, create a timeout to commit suicide when no 
> launch task is received within, say, 10 seconds of registering with the slave.
>     
>     2) Send the launch task to the executor anyway, immediately followed by 
> the kill task request. This is tricky.
>     
>     3) Leave as is and have the MesosExecutor for Hadoop commit suicide if no 
> task is received within 10 seconds of registration. Again, this only fixes 
> the issue for the Hadoop executor.
>     
>     So 1 seems to be the best option here. Other thoughts Vinod? Brenden, do 
> you want to take that on, or file an issue?


I wrote a quick patch, if I understand your proposal correctly.


- Brenden


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10932/#review20136
-----------------------------------------------------------


On May 6, 2013, 7:03 p.m., Brenden Matthews wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10932/
> -----------------------------------------------------------
> 
> (Updated May 6, 2013, 7:03 p.m.)
> 
> 
> Review request for mesos.
> 
> 
> Description
> -------
> 
> From 607072595b91993e2d47251ee841fb3dc5d84e05 Mon Sep 17 00:00:00 2001
> From: Brenden Matthews <[email protected]>
> Date: Fri, 3 May 2013 09:47:22 -0700
> Subject: [PATCH 8/9] Terminate executors that aren't needed.
> 
> If we launch an executor and then kill the task immediately after, make
> sure we also terminate the executor when there are no other tasks.
> ---
>  src/slave/slave.cpp |   48 +++++++++++++++++++++++++++---------------------
>  1 file changed, 27 insertions(+), 21 deletions(-)
> 
> 
> Diffs
> -----
> 
>   include/mesos/executor.hpp 9b25834 
>   src/exec/exec.cpp 1f022ca 
> 
> Diff: https://reviews.apache.org/r/10932/diff/
> 
> 
> Testing
> -------
> 
> Used in production at airbnb.
> 
> 
> Thanks,
> 
> Brenden Matthews
> 
>

Re: Review Request: Terminate executors that aren't needed.

Reply via email to