[
https://issues.apache.org/jira/browse/AURORA-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14596764#comment-14596764
]
brian wickman commented on AURORA-698:
--------------------------------------
{noformat}
commit 73ceeb22a18e4b3df3bffb04cf7d58527066fb5a
Author: Brian Wickman <[email protected]>
Date: Mon Jun 1 15:20:25 2015 -0700
Daemonize all deadline calls in aurora executor.
If we do not daemonize, it's possible for the aurora executor to send
TASK_KILLED and then block indefinitely on shutdown. This way the aurora
executor process will at least exit, allow the cgroup to tear down all
active processes.
Testing Done:
./pants test src/test/python/apache/aurora/executor::
Bugs closed: AURORA-698
Reviewed at https://reviews.apache.org/r/34484/
{noformat}
> aurora executor _shutdown deadline calls should be daemonized
> -------------------------------------------------------------
>
> Key: AURORA-698
> URL: https://issues.apache.org/jira/browse/AURORA-698
> Project: Aurora
> Issue Type: Bug
> Components: Executor
> Reporter: brian wickman
> Assignee: brian wickman
>
> In the aurora executor shutdown method, we have deadline() calls:
> {noformat}
> def _shutdown(self, status_result):
> runner_status = self._runner.status
> try:
> deadline(self._runner.stop, timeout=self.STOP_TIMEOUT)
> except Timeout:
> log.error('Failed to stop runner within deadline.')
> try:
> deadline(self._chained_checker.stop, timeout=self.STOP_TIMEOUT)
> except Timeout:
> log.error('Failed to stop all checkers within deadline.')
> # If the runner was alive when _shutdown was called, defer to the
> status_result,
> # otherwise the runner's terminal state is the preferred state.
> exit_status = runner_status or status_result
> self.send_update(
> self._driver,
> self._task_id,
> exit_status.status,
> status_result.reason)
> self.terminated.set()
> defer(self._driver.stop, delay=self.PERSISTENCE_WAIT)
> {noformat}
> However if runner.stop fails with a Timeout exception, the spawned
> AnonymousThread is not daemonized and causes the executor to fail to exit.
> This means that the cgroup will not be torn down and if the runner.stop
> actually failed, the process can stay alive even if TASK_KILLED was delivered.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)