In the interest of doing our due diligence, have you studied any prior art?
For example, I was surprised to notice that htcondor doesn't really provide this as a first class thing: https://lists.cs.wisc.edu/archive/htcondor-users/2006- November/msg00024.shtml I didn't see it in any other systems I looked at either, with people suggesting wrapping commands with the 'timeout' command. I suspect most systems have the user do this on their own with a simple timeout wrapper script? On Fri, Mar 23, 2018 at 2:21 PM, Zhitao Li <zhitaoli...@gmail.com> wrote: > Hi everyone, > > I'd like to do an API review for MESOS-8725 > <https://issues.apache.org/jira/browse/MESOS-8725>. We are adding an > optional `max_duration` to `TaskInfo` field. If a task does not terminate > within this duration, built-in executors will kill the task with a new > reason `REASON_MAX_DURATION_REACHED`. > > Proof of concept patch: > https://reviews.apache.org/r/66258/ > > Reference implementation in command executor: > https://reviews.apache.org/r/66259/ > > A design choice we made is to make this relative duration rather than an > absolute timestamp of deadline. Our rationales: > > - Cluster could suffer from clock skews, so same absolute deadline would > result in inconsistent behavior; > - Framework can just trivially translate its own clock as source of > truth to translate absolute deadline to current time + max_duration. > > Please let me know what you think. Thanks. > > -- > Cheers, > > Zhitao Li >