[ https://issues.apache.org/jira/browse/MESOS-1571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14111816#comment-14111816 ]
Niklas Quarfot Nielsen commented on MESOS-1571: ----------------------------------------------- [~tillt] Would you be up for shepherding this change? How about having EXECUTOR_SHUTDOWN_TIMEOUT as an upper limit for the per-task configurable timeout? I think we need to differentiate between two scenarios: 1) killTask() is called. In the command executor, this just calls its own shutdown() and _only_ the escalation in src/launcher/executor.cpp takes effect. {code} Slave Exec CommandExecutor + + + killTask() | | | +---------> | | | killTask() | | +---------------> | | | killTask() | | +---------------> | | | | | +-------+ | | | | | | | | | | <-------+ | | | shutdown() | | | ^ | | | | | | | | EXECUTOR_SIGNAL_ESCALATION_TIMEOUT | | | | | | | v | | | escalated() v v v {code} 2) The executor is shutdown due to frameworkShutdown. shutdown() is called in src/exec/exec.cpp which in turn calls shutdown on the underlying executor implementation. That is where we have the nested timeout including an escalation within the slave (executor_shutdown_grace_period) which calls containerizer->destroy() {code} Slave Exec CommandExecutor + + + | | | | | | | shutdown() | | +-^-------------> | | | | shutdown() | | | +-^-------------> shutdown() | | | | | ^ | | | | | | | flags. | SHUTDOWN_ | | EXECUTOR_SIGNAL_ESCALATION_TIMEOUT | shutdown_ | GRACE_PERIOD | | | grace_period | | | v | | | | | escalated() | | | v | | | | ShutdownProcess | | | kill() | | v | | | shutdownExecutorTimeout() | | | | v v v Containerizer->destroy() {code} EXECUTOR_SHUTDOWN_GRACE_PERIOD is not configurable, but flags.executor_shutdown_grace_period in the slave is. This hints that we can start by looking at the command executor timeout alone (if I didn't miss anything). The upper bound for shutdown is EXECUTOR_SHUTDOWN_GRACE_PERIOD (5 seconds) already, so we can consider that next. How about starting making EXECUTOR_SHUTDOWN_GRACE_PERIOD configurable through src/slave/flags.hpp in one patch and then work on a patch to add a escalation timeout to the command_info (along side new tests)? I, however, find it a bit misleading that the executor_shutdown_grace_period flag really only steps in if the ShutdownProcess::kill doesn't kill the executor. Can anyone clarify this? > Signal escalation timeout is not configurable > --------------------------------------------- > > Key: MESOS-1571 > URL: https://issues.apache.org/jira/browse/MESOS-1571 > Project: Mesos > Issue Type: Bug > Reporter: Niklas Quarfot Nielsen > Assignee: Alexander Rukletsov > > Even though the executor shutdown grace period is set to a larger interval, > the signal escalation timeout will still be 3 seconds. It should either be > configurable or dependent on EXECUTOR_SHUTDOWN_GRACE_PERIOD. > Thoughts? -- This message was sent by Atlassian JIRA (v6.2#6252)