[
https://issues.apache.org/jira/browse/AURORA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joshua Cohen reassigned AURORA-1614:
------------------------------------
Assignee: Joshua Cohen
> Failed sandbox initialization can cause tasks to go LOST
> --------------------------------------------------------
>
> Key: AURORA-1614
> URL: https://issues.apache.org/jira/browse/AURORA-1614
> Project: Aurora
> Issue Type: Bug
> Components: Executor
> Reporter: Joshua Cohen
> Assignee: Joshua Cohen
> Priority: Minor
>
> When we initialize the sandbox, we only catch Sandbox specific error types,
> meaning that if an unexpected error is raised, the executor just hangs until
> the timeout is exceeded, at which point the task goes lost.
> We should instead broadly catch exceptions raised during sandbox
> initialization and quickly fail tasks.
> Additionally, the {{DockerDirectorySandbox}} was not properly catching errors
> raised when creating/symlinking which led to the above problem in the event
> of a misconfiguration. In practice this issue shouldn't have occurred in
> normal usage, but it made development slow until I tracked down what was
> causing the tasks to just hang.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)