Re: Graceful Shutdown Design

Benjamin Mahler Thu, 13 Nov 2014 13:02:45 -0800

Short term, cleaning up the current static configuration mess sounds good.

Some food for thought for the longer term:


(1) Keep in mind that the ShutdownProcess inside ExecutorProcess will be
going away when we have pure language bindings.

(2) Killing a task and shutting down an executor are independent concepts.
Unfortunately, executor shutdown is not exposed in the framework API yet.
So, when it comes to custom executors, the grace period (or whatever other
finalization [1]) is *completely* in the hands of the framework.

(3) We provide CommandExecutor as a convenience, and as we've discovered
broadly useful concepts for frameworks, like health checking, we've added
them in. It sounds like doing "finalization" might be another broadly
useful concept, wherein much like they can control the definition of a
health check, they will want to control the definition of "finalization".

Thoughts?

[1] I believe Thermos exposes some finalization, might be useful to
reference:
http://aurora.incubator.apache.org/documentation/latest/configuration-reference/#final

On Wed, Nov 12, 2014 at 3:06 PM, Niklas Nielsen <[email protected]>
wrote:

> I thought signal escalation as per-executor or actually everywhere where we
> execute a command info as a subprocess.
> The new grace period is meant as the time an executor has to finish off
> it's things - changing the other timeouts had to be done as they will in
> most cases be shorter.
> For custom executors, it is up to themselves to honor the timeout; or else,
> the executor process will kill it after timeout + delta time.
>
> Ben, are you thinking of a more generalized finalization mechanism
> (pluggable, programmable)?
>
> Niklas
>
> On 11 November 2014 10:34, Alex Rukletsov <[email protected]> wrote:
>
> > Ben,
> >
> > there are two scenarios: executor shutdown and killTask() in
> > CommandExecutor. For the first use case, each custom executor is affected
> > through the ExecutorProcess, that means two levels are involved
> > (containerizer and executor) and should be synchronized.
> >
> > In the second scenario, each task is tied to its own CommandExecutor,
> > therefore killing a task implies killing its executor. In this case,
> grace
> > shutdown period becomes also a signal escalation timeout and conflating
> > them together, I think, is a good idea. The proposed design doc is an
> > effort to align timeouts along the chain from slave to CommanExecutor.
> >
> > If I understand you correctly, we want to shutdown any executor (task)
> > gracefully, and do not tie grace period to CommandExecutor only. A good
> > example pointed by Ankur Chauhan is MESOS-1925
> > <https://issues.apache.org/jira/browse/MESOS-1925>: we can reuse reuse
> the
> > same grace shutdown flag for dockers. And if we later enable frameworks
> to
> > adjust timeouts for its tasks (or executors, to be precise), we will be
> > able to align the timeout used by docker finalization with the timeout in
> > docker container.
> >
> > On Mon, Nov 10, 2014 at 10:00 PM, Benjamin Mahler <
> > [email protected]
> > > wrote:
> >
> > > I'm guessing most of the motivation here is actually for task killing
> > > escalation in the command executor? The shutdown grace period was
> > designed
> > > for executor shutdown only, which today occurs only when the framework
> is
> > > being shutdown (or recovery is cleaning up), or in the future, when
> > > frameworks ask to shutdown a specific executor.
> > >
> > > In the case of the command executor, the slave won't do any escalation
> > when
> > > a killTask arrives, since it's not trying to shutdown the executor. For
> > > simplicity (I'm guessing), we conflated the executor shutdown grace
> > period,
> > > with the killTask signal escalation in the command executor.
> > >
> > > So, I'm still trying to figure out the concrete use case here, is it
> that
> > > you have command-tasks that implement a clean shutdown driven by
> SIGTERM?
> > > Going forward, is that enough or would we want a more general notion of
> > > "Finalization" (e.g. driven by HTTP, or SIGTERM, or subprocess, etc),
> > much
> > > like the generic health checking that was added.
> > >
> > > On Mon, Nov 10, 2014 at 8:08 AM, Alex Rukletsov <[email protected]>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I would like to share the design doc for configurable grace period
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1_b3OPv3tjkub1T6VhQ27GnDfbVjnJ6IQ4ufPQhV1HM8/edit?usp=sharing
> > > > >.
> > > > The doc describes two approaches to calculate nested grace periods,
> > > points
> > > > out implementation details and opens several design questions.
> > > >
> > > > I would highly appreciate any thoughts, ideas and suggestions!
> > > >
> > > > Thanks,
> > > > Alex
> > > >
> > >
> >
>

Re: Graceful Shutdown Design

Reply via email to