[
https://issues.apache.org/jira/browse/OOZIE-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Satish Subhashrao Saley reassigned OOZIE-2326:
----------------------------------------------
Assignee: Satish Subhashrao Saley
> oozie/yarn/spark: active container remains after failed job
> -----------------------------------------------------------
>
> Key: OOZIE-2326
> URL: https://issues.apache.org/jira/browse/OOZIE-2326
> Project: Oozie
> Issue Type: Bug
> Components: workflow
> Affects Versions: 4.1.0
> Environment: pseudo-distributed (single VM), CentOS 6.6, CDH 5.4.3
> Reporter: Diana Carroll
> Assignee: Satish Subhashrao Saley
> Attachments: container-logs.txt, ooziejob-logs.txt, yarnbug1.png,
> yarnbug2.png
>
>
> Issue occurs when I launch a Spark job (local mode) that fails. (My example
> failed because I tried to read a non-existent file). When this occur, the
> job fails, and YARN ends up in a weird state: the RM manager shows the launch
> job has completed...but a container for the job is still live on the slave
> node. Because I'm running in pseudo-dist mode, this totally hangs my
> cluster: no other jobs can run because there are only resources for a single
> container, and that container is running the dead Oozie launcher.
> If I wait long enough, YARN will eventually time out and release the
> container and start accepting new jobs. But until then I'm dead in the water.
> Attaching screen shots that show the state right after running the failed job:
> the RM shows no jobs running
> the node shows one container running
> Also attaching a log file for the oozie job and the container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)