Joshua Cohen created AURORA-1614:
------------------------------------
Summary: Failed sandbox initialization can cause tasks to go LOST
Key: AURORA-1614
URL: https://issues.apache.org/jira/browse/AURORA-1614
Project: Aurora
Issue Type: Bug
Components: Executor
Reporter: Joshua Cohen
Priority: Minor
When we initialize the sandbox, we only catch Sandbox specific error types,
meaning that if an unexpected error is raised, the executor just hangs until
the timeout is exceeded, at which point the task goes lost.
We should instead broadly catch exceptions raised during sandbox initialization
and quickly fail tasks.
Additionally, the {{DockerDirectorySandbox}} was not properly catching errors
raised when creating/symlinking which led to the above problem in the event of
a misconfiguration. In practice this issue shouldn't have occurred in normal
usage, but it made development slow until I tracked down what was causing the
tasks to just hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)