Re: Graceful Shutdown Design

Benjamin Mahler Fri, 14 Nov 2014 11:50:28 -0800

For (2), sorry, I get a bit confused when "executor" and "CommandExecutor"
are used interchangeably.


So let me confirm we're on the same page: Task finalization is only
relevant for the CommandExecutor, that's where the CommandExecutor assumes
that all tasks need a similar grace period. For regular executors, the
finalization of a task is entirely in the executor's hands, the slave does
not use a grace period here to impose anything. For all executors
(CommandExecutor included), a grace period will be used when the executor
is being shutdown. Does this match your understanding?

On Fri, Nov 14, 2014 at 5:37 AM, Alex Rukletsov <[email protected]> wrote:

> (1) When the ShutdownProcess is gone, we'll adjust the timeout calculation
> logic.
>
> (2) In the proposal, we tie the grace period to the executor, but use it
> for task finalization. Effectively we assume that each executor launches
> similar tasks. This may seem a bit weird, but having timeouts per task
> makes the change more intrusive. I would propose to revisit this concept
> when we start discussing a more general "finalization" approach.
>
> (3) I think the concept of custom finalization may be useful. Anyway, we
> need to have a timeout for the cases when finalization is stuck or takes
> too much time. It would be nice to have a single flag for such timeout. Now
> we already have two: one for executor shutdown and one for the docker stop
> timeout.
>
> I'm glad that we all agree current grace shutdown configuration needs some
> love. I'll follow up with the patches soon.
>
> On Thu, Nov 13, 2014 at 9:59 PM, Benjamin Mahler <
> [email protected]>
> wrote:
>
> > Short term, cleaning up the current static configuration mess sounds
> good.
> >
> > Some food for thought for the longer term:
> >
> > (1) Keep in mind that the ShutdownProcess inside ExecutorProcess will be
> > going away when we have pure language bindings.
> >
> > (2) Killing a task and shutting down an executor are independent
> concepts.
> > Unfortunately, executor shutdown is not exposed in the framework API yet.
> > So, when it comes to custom executors, the grace period (or whatever
> other
> > finalization [1]) is *completely* in the hands of the framework.
> >
> > (3) We provide CommandExecutor as a convenience, and as we've discovered
> > broadly useful concepts for frameworks, like health checking, we've added
> > them in. It sounds like doing "finalization" might be another broadly
> > useful concept, wherein much like they can control the definition of a
> > health check, they will want to control the definition of "finalization".
> >
> > Thoughts?
> >
> > [1] I believe Thermos exposes some finalization, might be useful to
> > reference:
> >
> >
> http://aurora.incubator.apache.org/documentation/latest/configuration-reference/#final
> >
> > On Wed, Nov 12, 2014 at 3:06 PM, Niklas Nielsen <[email protected]>
> > wrote:
> >
> > > I thought signal escalation as per-executor or actually everywhere
> where
> > we
> > > execute a command info as a subprocess.
> > > The new grace period is meant as the time an executor has to finish off
> > > it's things - changing the other timeouts had to be done as they will
> in
> > > most cases be shorter.
> > > For custom executors, it is up to themselves to honor the timeout; or
> > else,
> > > the executor process will kill it after timeout + delta time.
> > >
> > > Ben, are you thinking of a more generalized finalization mechanism
> > > (pluggable, programmable)?
> > >
> > > Niklas
> > >
> > > On 11 November 2014 10:34, Alex Rukletsov <[email protected]> wrote:
> > >
> > > > Ben,
> > > >
> > > > there are two scenarios: executor shutdown and killTask() in
> > > > CommandExecutor. For the first use case, each custom executor is
> > affected
> > > > through the ExecutorProcess, that means two levels are involved
> > > > (containerizer and executor) and should be synchronized.
> > > >
> > > > In the second scenario, each task is tied to its own CommandExecutor,
> > > > therefore killing a task implies killing its executor. In this case,
> > > grace
> > > > shutdown period becomes also a signal escalation timeout and
> conflating
> > > > them together, I think, is a good idea. The proposed design doc is an
> > > > effort to align timeouts along the chain from slave to
> CommanExecutor.
> > > >
> > > > If I understand you correctly, we want to shutdown any executor
> (task)
> > > > gracefully, and do not tie grace period to CommandExecutor only. A
> good
> > > > example pointed by Ankur Chauhan is MESOS-1925
> > > > <https://issues.apache.org/jira/browse/MESOS-1925>: we can reuse
> reuse
> > > the
> > > > same grace shutdown flag for dockers. And if we later enable
> frameworks
> > > to
> > > > adjust timeouts for its tasks (or executors, to be precise), we will
> be
> > > > able to align the timeout used by docker finalization with the
> timeout
> > in
> > > > docker container.
> > > >
> > > > On Mon, Nov 10, 2014 at 10:00 PM, Benjamin Mahler <
> > > > [email protected]
> > > > > wrote:
> > > >
> > > > > I'm guessing most of the motivation here is actually for task
> killing
> > > > > escalation in the command executor? The shutdown grace period was
> > > > designed
> > > > > for executor shutdown only, which today occurs only when the
> > framework
> > > is
> > > > > being shutdown (or recovery is cleaning up), or in the future, when
> > > > > frameworks ask to shutdown a specific executor.
> > > > >
> > > > > In the case of the command executor, the slave won't do any
> > escalation
> > > > when
> > > > > a killTask arrives, since it's not trying to shutdown the executor.
> > For
> > > > > simplicity (I'm guessing), we conflated the executor shutdown grace
> > > > period,
> > > > > with the killTask signal escalation in the command executor.
> > > > >
> > > > > So, I'm still trying to figure out the concrete use case here, is
> it
> > > that
> > > > > you have command-tasks that implement a clean shutdown driven by
> > > SIGTERM?
> > > > > Going forward, is that enough or would we want a more general
> notion
> > of
> > > > > "Finalization" (e.g. driven by HTTP, or SIGTERM, or subprocess,
> etc),
> > > > much
> > > > > like the generic health checking that was added.
> > > > >
> > > > > On Mon, Nov 10, 2014 at 8:08 AM, Alex Rukletsov <
> [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I would like to share the design doc for configurable grace
> period
> > > > > > <
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1_b3OPv3tjkub1T6VhQ27GnDfbVjnJ6IQ4ufPQhV1HM8/edit?usp=sharing
> > > > > > >.
> > > > > > The doc describes two approaches to calculate nested grace
> periods,
> > > > > points
> > > > > > out implementation details and opens several design questions.
> > > > > >
> > > > > > I would highly appreciate any thoughts, ideas and suggestions!
> > > > > >
> > > > > > Thanks,
> > > > > > Alex
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Graceful Shutdown Design

Reply via email to