For (2), sorry, I get a bit confused when "executor" and "CommandExecutor" are used interchangeably.
So let me confirm we're on the same page: Task finalization is only relevant for the CommandExecutor, that's where the CommandExecutor assumes that all tasks need a similar grace period. For regular executors, the finalization of a task is entirely in the executor's hands, the slave does not use a grace period here to impose anything. For all executors (CommandExecutor included), a grace period will be used when the executor is being shutdown. Does this match your understanding? On Fri, Nov 14, 2014 at 5:37 AM, Alex Rukletsov <[email protected]> wrote: > (1) When the ShutdownProcess is gone, we'll adjust the timeout calculation > logic. > > (2) In the proposal, we tie the grace period to the executor, but use it > for task finalization. Effectively we assume that each executor launches > similar tasks. This may seem a bit weird, but having timeouts per task > makes the change more intrusive. I would propose to revisit this concept > when we start discussing a more general "finalization" approach. > > (3) I think the concept of custom finalization may be useful. Anyway, we > need to have a timeout for the cases when finalization is stuck or takes > too much time. It would be nice to have a single flag for such timeout. Now > we already have two: one for executor shutdown and one for the docker stop > timeout. > > I'm glad that we all agree current grace shutdown configuration needs some > love. I'll follow up with the patches soon. > > On Thu, Nov 13, 2014 at 9:59 PM, Benjamin Mahler < > [email protected]> > wrote: > > > Short term, cleaning up the current static configuration mess sounds > good. > > > > Some food for thought for the longer term: > > > > (1) Keep in mind that the ShutdownProcess inside ExecutorProcess will be > > going away when we have pure language bindings. > > > > (2) Killing a task and shutting down an executor are independent > concepts. > > Unfortunately, executor shutdown is not exposed in the framework API yet. > > So, when it comes to custom executors, the grace period (or whatever > other > > finalization [1]) is *completely* in the hands of the framework. > > > > (3) We provide CommandExecutor as a convenience, and as we've discovered > > broadly useful concepts for frameworks, like health checking, we've added > > them in. It sounds like doing "finalization" might be another broadly > > useful concept, wherein much like they can control the definition of a > > health check, they will want to control the definition of "finalization". > > > > Thoughts? > > > > [1] I believe Thermos exposes some finalization, might be useful to > > reference: > > > > > http://aurora.incubator.apache.org/documentation/latest/configuration-reference/#final > > > > On Wed, Nov 12, 2014 at 3:06 PM, Niklas Nielsen <[email protected]> > > wrote: > > > > > I thought signal escalation as per-executor or actually everywhere > where > > we > > > execute a command info as a subprocess. > > > The new grace period is meant as the time an executor has to finish off > > > it's things - changing the other timeouts had to be done as they will > in > > > most cases be shorter. > > > For custom executors, it is up to themselves to honor the timeout; or > > else, > > > the executor process will kill it after timeout + delta time. > > > > > > Ben, are you thinking of a more generalized finalization mechanism > > > (pluggable, programmable)? > > > > > > Niklas > > > > > > On 11 November 2014 10:34, Alex Rukletsov <[email protected]> wrote: > > > > > > > Ben, > > > > > > > > there are two scenarios: executor shutdown and killTask() in > > > > CommandExecutor. For the first use case, each custom executor is > > affected > > > > through the ExecutorProcess, that means two levels are involved > > > > (containerizer and executor) and should be synchronized. > > > > > > > > In the second scenario, each task is tied to its own CommandExecutor, > > > > therefore killing a task implies killing its executor. In this case, > > > grace > > > > shutdown period becomes also a signal escalation timeout and > conflating > > > > them together, I think, is a good idea. The proposed design doc is an > > > > effort to align timeouts along the chain from slave to > CommanExecutor. > > > > > > > > If I understand you correctly, we want to shutdown any executor > (task) > > > > gracefully, and do not tie grace period to CommandExecutor only. A > good > > > > example pointed by Ankur Chauhan is MESOS-1925 > > > > <https://issues.apache.org/jira/browse/MESOS-1925>: we can reuse > reuse > > > the > > > > same grace shutdown flag for dockers. And if we later enable > frameworks > > > to > > > > adjust timeouts for its tasks (or executors, to be precise), we will > be > > > > able to align the timeout used by docker finalization with the > timeout > > in > > > > docker container. > > > > > > > > On Mon, Nov 10, 2014 at 10:00 PM, Benjamin Mahler < > > > > [email protected] > > > > > wrote: > > > > > > > > > I'm guessing most of the motivation here is actually for task > killing > > > > > escalation in the command executor? The shutdown grace period was > > > > designed > > > > > for executor shutdown only, which today occurs only when the > > framework > > > is > > > > > being shutdown (or recovery is cleaning up), or in the future, when > > > > > frameworks ask to shutdown a specific executor. > > > > > > > > > > In the case of the command executor, the slave won't do any > > escalation > > > > when > > > > > a killTask arrives, since it's not trying to shutdown the executor. > > For > > > > > simplicity (I'm guessing), we conflated the executor shutdown grace > > > > period, > > > > > with the killTask signal escalation in the command executor. > > > > > > > > > > So, I'm still trying to figure out the concrete use case here, is > it > > > that > > > > > you have command-tasks that implement a clean shutdown driven by > > > SIGTERM? > > > > > Going forward, is that enough or would we want a more general > notion > > of > > > > > "Finalization" (e.g. driven by HTTP, or SIGTERM, or subprocess, > etc), > > > > much > > > > > like the generic health checking that was added. > > > > > > > > > > On Mon, Nov 10, 2014 at 8:08 AM, Alex Rukletsov < > [email protected]> > > > > > wrote: > > > > > > > > > > > Hi all, > > > > > > > > > > > > I would like to share the design doc for configurable grace > period > > > > > > < > > > > > > > > > > > > > > > > > > > > > https://docs.google.com/document/d/1_b3OPv3tjkub1T6VhQ27GnDfbVjnJ6IQ4ufPQhV1HM8/edit?usp=sharing > > > > > > >. > > > > > > The doc describes two approaches to calculate nested grace > periods, > > > > > points > > > > > > out implementation details and opens several design questions. > > > > > > > > > > > > I would highly appreciate any thoughts, ideas and suggestions! > > > > > > > > > > > > Thanks, > > > > > > Alex > > > > > > > > > > > > > > > > > > > > >
