[ 
https://issues.apache.org/jira/browse/MESOS-999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14628517#comment-14628517
 ] 

Yan Xu commented on MESOS-999:
------------------------------

I am looking at in two ways:
1. The original {{--executor_registration_timeout}} was added because of the 
potential long delay caused by fetching, however with the new launch timeout + 
executor registration timeout split, the fetching delay and the provisioning 
delay are lumped into the launch delay and the registration timeout becomes not 
very useful because it should be fairly quick. In fact, a similar timeout 
{{const Duration EXECUTOR_REREGISTER_TIMEOUT = Seconds(2);}} is not even 
exposed by a flag. So instead of creating finer-grained timeouts, we are 
effectively replacing one with another.

2. End-to-end timeout vs. Multiple fine-grained ones. Multiple timeouts adds 
complexity in operation (need to configure them separately) and implementation 
(may need to introduce more states to implement them properly) but there is 
only one reason to them, which is, AFAIC, to prevent a task from being stuck 
for too long before it transitions into RUNNING state (so a framework can 
reschedule it elsewhere). So in this sense one coarse end-to-end timeout is all 
we need. Can you provide examples on when the operator would find it useful to 
specifically configure timeouts for different stages?


> Slave should wait() and start executor registration timeout after launch 
> -------------------------------------------------------------------------
>
>                 Key: MESOS-999
>                 URL: https://issues.apache.org/jira/browse/MESOS-999
>             Project: Mesos
>          Issue Type: Bug
>          Components: isolation
>    Affects Versions: 0.18.0
>            Reporter: Ian Downes
>            Priority: Minor
>
> The current code will start launch a container and wait on it before the 
> launch is complete. We should do this only after the container has 
> successfully launched. Likewise for the executor registration timeout.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to