I created a ticket and I believe I have a patch if you triage this task https://issues.apache.org/jira/browse/AURORA-1706
-Igor On Wed, May 18, 2016 at 7:31 PM, Igor Morozov <igm...@gmail.com> wrote: > Yes, we are running 0.13.1. > > Ok then, I'll file a task and will prepare a patch for review. > > Thanks, > -Igor > > On Wed, May 18, 2016 at 6:51 PM, Maxim Khutornenko <ma...@apache.org> wrote: >> I don't see much problem in making it configurable at the executor level. >> >> Just to make sure though, are you running your executors with this fix: >> https://issues.apache.org/jira/browse/AURORA-1642? >> >> We had a similar problem where any kill took exactly 1 minute to complete, >> hence the above fix. >> >> On Wed, May 18, 2016 at 5:46 PM, Igor Morozov <igm...@gmail.com> wrote: >> >>> Folks, >>> >>> We need to support a use case here at Uber when service processes that >>> don't respect SIGTERM signal and get killed after a default hardcoded >>> preemption timeout of 1 minute during task kill or task restart. That >>> significantly slows down upgrade workflow for such services. >>> We'd like to control this timeout, essentially reducing it to 5-10 seconds. >>> >>> My current thinking is to expose preemption_wait timeout >>> >>> class ThermosTaskRunner(TaskRunner): >>> .... >>> THERMOS_PREEMPTION_WAIT = Amount(1, Time.MINUTES) >>> >>> in thermos executor flags and set it in >>> DefaultThermosTaskRunnerProvider eventually propagating to all >>> ThermosRunner tasks. >>> >>> A proper fix would be probably something in the line of making this >>> timeout configurable per task config but that would involve changing >>> pystachio thermos schema. >>> >>> Thoughts? >>> >>> -Igor Morozov >>> > > > > -- > -Igor -- -Igor