Re: Graceful task shutdown

Brian Brazil Tue, 07 Apr 2015 14:16:44 -0700

On 7 April 2015 at 21:28, Erb, Stephan <[email protected]> wrote:


> Brian, do you have any particular plans regarding your shutdown
> requirements? I have seen that you have filed another issue [1] which is
> also concerned with graceful shutdown.
>

Given this thread, I now only wish to hit a different endpoint than
/quitquitquit (and I may aswell do /abortabortabort while I'm at it). The
rest is changes to our internal shutdown handling.


> Stephan
>
> PS: For what it's worth, I implemented the 'quick fix' version to my
> problem stated in the beginning of this thread [2].
>

That's handy. When writing the code up today I noticed that hitting
/quitquitquit wasn't unittested. I hope to have that up for review tomorrow
with unittests, which you could build on to do a more end-to-end unittest
for your code.

Brian


> [1] https://issues.apache.org/jira/browse/AURORA-1257
> [2] https://reviews.apache.org/r/32889/
>
> ________________________________________
> From: Brian Brazil <[email protected]>
> Sent: Tuesday, March 24, 2015 10:48 PM
> To: [email protected]
> Subject: Re: Graceful task shutdown
>
> On 24 March 2015 at 21:33, George Sirois <[email protected]> wrote:
>
> > Unfortunately I don't think my change will be able to make it in as-is.
> >
> > As Brian Wickman pointed out, it could introduce serious problems because
> > there are varying timeouts across the scheduler/executor, so if you set
> > your wait time to be too high, the scheduler might start to consider the
> > tasks lost because they stayed in the transient KILLING state for too
> long.
> >
>
> Hmm, what sort of work is involved in resolving that?
>
> In my case I need at least 12s after the /qqq before sending the TERM.
>
> Brian
>
>
> >
> > I do think the lifecycle modules idea would solve Stephan's issue.
> >
> > On Tue, Mar 24, 2015 at 5:06 PM, Brian Brazil <[email protected]>
> > wrote:
> >
> > > On 24 March 2015 at 20:57, Erb, Stephan <[email protected]>
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > we are implementing the /health endpoint in our services but omit the
> > > > implementation of the unauthenticated lifecycle methods /quitquitquit
> > and
> > > > /abortabortabort.
> > > >
> > > > As a consequence, stopping a service is taxed by 10 seconds waiting
> > time
> > > > [1]. I would like to get rid of this unnecessary delay and can think
> of
> > > two
> > > > solutions:
> > > >
> > > > a) Only perform the escalation wait when the http_signaler reports
> that
> > > > the message could be delivered to the service. This is a rather
> simple
> > > and
> > > > localized fix.
> > > >
> > > > b) Use another port for lifecycle events. This would require a new
> > > > addition to the task configuration and proper plumbing throughout the
> > > rest
> > > > of the system. Backward compatibility could be achieved by using
> > 'health'
> > > > as the default lifecycle management port.
> > > >
> > > > Any thoughts? I would be happy with the simple solution, but in the
> end
> > > > it's your call :-)
> > > >
> > >
> > > __george mentioned on IRC working on a change that'll let the wait time
> > be
> > > configurable (which is something I also need), would that cover your
> use
> > > case?
> > >
> > > There were also discussions on IRC about custom lifecycle modules.
> > >
> > > Brian
> > >
> > >
> > > >
> > > > Best Regards,
> > > > Stephan
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/incubator-aurora/blob/master/src/main/python/apache/aurora/executor/thermos_task_runner.py#L123
> > >
> >
>

Re: Graceful task shutdown

Reply via email to