I don't see much problem in making it configurable at the executor level. Just to make sure though, are you running your executors with this fix: https://issues.apache.org/jira/browse/AURORA-1642?
We had a similar problem where any kill took exactly 1 minute to complete, hence the above fix. On Wed, May 18, 2016 at 5:46 PM, Igor Morozov <igm...@gmail.com> wrote: > Folks, > > We need to support a use case here at Uber when service processes that > don't respect SIGTERM signal and get killed after a default hardcoded > preemption timeout of 1 minute during task kill or task restart. That > significantly slows down upgrade workflow for such services. > We'd like to control this timeout, essentially reducing it to 5-10 seconds. > > My current thinking is to expose preemption_wait timeout > > class ThermosTaskRunner(TaskRunner): > .... > THERMOS_PREEMPTION_WAIT = Amount(1, Time.MINUTES) > > in thermos executor flags and set it in > DefaultThermosTaskRunnerProvider eventually propagating to all > ThermosRunner tasks. > > A proper fix would be probably something in the line of making this > timeout configurable per task config but that would involve changing > pystachio thermos schema. > > Thoughts? > > -Igor Morozov >