Re: reasonable preemption delay to use

Maxim Khutornenko Tue, 17 Feb 2015 09:39:45 -0800

The watch_secs is triggered when a task enters RUNNING. In order for
the rolling update to not fail early the restart_threshold [1] needs
to be bumped up to account for the preemption delay.


As for the default preemption delay, it was implemented to avoid
unnecessary churn in the cluster. Larger/constraint-diverse tasks take
longer to bin-place, as such there could be occasional scheduling
delays when resources are tight. Hence, the grace buffer. You can
definitely dial it in given the specifics of your cluster.

Thanks,
Maxim

[1] - 
https://github.com/apache/incubator-aurora/blob/master/docs/configuration-reference.md#updateconfig-objects

On Tue, Feb 17, 2015 at 12:51 AM, Erb, Stephan
<stephan....@blue-yonder.com> wrote:
> If I remember correctly, you also have to make sure that your UpdateConfig 
> watch_secs is larger than your preemption_delay. Otherwise a rolling update 
> of a production job might not be able to get the resources it needs.
>
> Best Regards,
> Stephan
> ________________________________________
> From: Bhuvan Arumugam <bhu...@apache.org>
> Sent: Monday, February 16, 2015 7:14 AM
> To: dev@aurora.incubator.apache.org
> Subject: reasonable preemption delay to use
>
> Hello,
> Recently, in one of our clusters we noticed production jobs go to
> PENDING state, due to insufficient CPU. The non production jobs are
> not preempted, as we haven't used --preemption_delay flag for
> scheduler. The default value for this flag is 10mins. Why is it too
> high? Is there any reasoning behind using 10mins as a default value?
>
> We are thinking to to use 2mins for this flag. We wouldn't want to
> wait beyond 2mins to run a prod job during resource constraint. Does
> it sound reasonable? What's the typical preemption delay used by SREs?
>
> --
> Regards,
> Bhuvan Arumugam
> www.livecipher.com

Re: reasonable preemption delay to use

Reply via email to