Hi Keith, What is the Hadoop version you are using? Judging from the log, it could be a bug in the Capacity scheduler[1]. Also, have you look at the node manager log of the node "worker14:40196"?
[1] https://issues.apache.org/jira/browse/YARN-2628 Terence On Wed, May 4, 2016 at 8:44 AM, Keith Turner <[email protected]> wrote: > I ran into an issue where Yarn does not seem to be starting container again > for an application after some containers died. The details of the issue I > am running into are outlined in fluo#657 [1]. > > Twill seems to be trying to restart the containers, but it seems YARN is > not doing it. Looking at the YARN RM web page there are enough cores and > memory available to start the containers, so I am not sure why its not > starting them. > > Does anyone has any tips for debugging this issue or hve a second to look > at the logs attached to fluo#657? > > [1] : https://github.com/fluo-io/fluo/issues/657 >
