[ 
https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111816#comment-14111816
 ] 

Niklas Quarfot Nielsen commented on MESOS-1571:
-----------------------------------------------

[~tillt] Would you be up for shepherding this change?

How about having EXECUTOR_SHUTDOWN_TIMEOUT as an upper limit for the per-task 
configurable timeout?

I think we need to differentiate between two scenarios:
1) killTask() is called. In the command executor, this just calls its own 
shutdown() and _only_ the escalation in src/launcher/executor.cpp takes effect.

{code}
                                                                                
               
                                                                                
               
              Slave            Exec      CommandExecutor                        
               
                                                                                
               
                +               +               +                               
               
     killTask() |               |               |                               
               
      +--------->               |               |                               
               
                |   killTask()  |               |                               
               
                +--------------->               |                               
               
                |               |   killTask()  |                               
               
                |               +--------------->                               
               
                |               |               |                               
               
                |               |               +-------+                       
               
                |               |               |       |                       
               
                |               |               |       |                       
               
                |               |               <-------+                       
               
                |               |               | shutdown()                    
               
                |               |               | ^                             
               
                |               |               | |                             
               
                |               |               | | 
EXECUTOR_SIGNAL_ESCALATION_TIMEOUT         
                |               |               | |                             
               
                |               |               | v                             
               
                |               |               | escalated()                   
               
                v               v               v                               
               
{code}

2) The executor is shutdown due to frameworkShutdown. shutdown() is called in 
src/exec/exec.cpp which in turn calls shutdown on the underlying executor 
implementation. That is where we have the nested timeout including an 
escalation within the slave (executor_shutdown_grace_period) which calls 
containerizer->destroy()

{code}
Slave            Exec      CommandExecutor                              
                                                                        
  +               +               +                                     
  |               |               |                                     
  |               |               |                                     
  |   shutdown()  |               |                                     
  +-^------------->               |                                     
  | |             |   shutdown()  |                                     
  | |             +-^-------------> shutdown()                          
  | |             | |             | ^                                   
  | |             | |             | |                                   
  | flags.        | SHUTDOWN_     | | EXECUTOR_SIGNAL_ESCALATION_TIMEOUT
  | shutdown_     | GRACE_PERIOD  | |                                   
  | grace_period  | |             | v                                   
  | |             | |             | escalated()                         
  | |             | v             |                                     
  | |             | ShutdownProcess                                     
  | |             | kill()        |                                     
  | v             |               |                                     
  | shutdownExecutorTimeout()     |                                     
  |               |               |                                     
  v               v               v                                     
                                                                        
    Containerizer->destroy()                                            

{code}

EXECUTOR_SHUTDOWN_GRACE_PERIOD is not configurable, but 
flags.executor_shutdown_grace_period in the slave is.

This hints that we can start by looking at the command executor timeout alone 
(if I didn't miss anything). The upper bound for shutdown is 
EXECUTOR_SHUTDOWN_GRACE_PERIOD (5 seconds) already, so we can consider that 
next.
How about starting making EXECUTOR_SHUTDOWN_GRACE_PERIOD configurable through 
src/slave/flags.hpp in one patch and then work on a patch to add a escalation 
timeout to the command_info (along side new tests)?

I, however, find it a bit misleading that the executor_shutdown_grace_period 
flag really only steps in if the ShutdownProcess::kill doesn't kill the 
executor. Can anyone clarify this?

> Signal escalation timeout is not configurable
> ---------------------------------------------
>
>                 Key: MESOS-1571
>                 URL: https://issues.apache.org/jira/browse/MESOS-1571
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Alexander Rukletsov
>
> Even though the executor shutdown grace period is set to a larger interval, 
> the signal escalation timeout will still be 3 seconds. It should either be 
> configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to