Re: Finalization wait timeout in thermos executor for a task's teardown sequence

Igor Morozov Thu, 19 May 2016 14:55:55 -0700

I created a ticket and I believe I have a patch if you triage this
task https://issues.apache.org/jira/browse/AURORA-1706


-Igor

On Wed, May 18, 2016 at 7:31 PM, Igor Morozov <[email protected]> wrote:
> Yes, we are running 0.13.1.
>
> Ok then, I'll file a task and will prepare a patch for review.
>
> Thanks,
> -Igor
>
> On Wed, May 18, 2016 at 6:51 PM, Maxim Khutornenko <[email protected]> wrote:
>> I don't see much problem in making it configurable at the executor level.
>>
>> Just to make sure though, are you running your executors with this fix:
>> https://issues.apache.org/jira/browse/AURORA-1642?
>>
>> We had a similar problem where any kill took exactly 1 minute to complete,
>> hence the above fix.
>>
>> On Wed, May 18, 2016 at 5:46 PM, Igor Morozov <[email protected]> wrote:
>>
>>> Folks,
>>>
>>> We need to support a use case here at Uber when service processes that
>>> don't respect SIGTERM signal and get killed after a default hardcoded
>>> preemption timeout of 1 minute during task kill or task restart. That
>>> significantly slows down upgrade workflow for such services.
>>> We'd like to control this timeout, essentially reducing it to 5-10 seconds.
>>>
>>> My current thinking is to expose preemption_wait timeout
>>>
>>> class ThermosTaskRunner(TaskRunner):
>>> ....
>>> THERMOS_PREEMPTION_WAIT = Amount(1, Time.MINUTES)
>>>
>>> in thermos executor flags and set it in
>>> DefaultThermosTaskRunnerProvider eventually propagating to all
>>> ThermosRunner tasks.
>>>
>>> A proper fix would be probably something in the line of making this
>>> timeout configurable per task config but that would involve changing
>>> pystachio thermos schema.
>>>
>>> Thoughts?
>>>
>>> -Igor Morozov
>>>
>
>
>
> --
> -Igor



-- 
-Igor

Re: Finalization wait timeout in thermos executor for a task's teardown sequence

Reply via email to