On Wed, May 4, 2016 at 2:14 PM, Terence Yim <[email protected]> wrote:
> Hi Keith, > > What is the Hadoop version you are using? Judging from the log, it could be > a bug in the Capacity scheduler[1]. > I am using Hadoop 2.6.3. So that bug should be fixed. > Also, have you look at the node manager log of the node "worker14:40196"? > No I had not, thats a good idea. I grepped that log for the yarn app id 1462212200762_0008 and saw nothing pertinent. I also looked around the time of the error message in the RM and saw nothing pertinent. > > [1] https://issues.apache.org/jira/browse/YARN-2628 > > Terence > > On Wed, May 4, 2016 at 8:44 AM, Keith Turner <[email protected]> wrote: > > > I ran into an issue where Yarn does not seem to be starting container > again > > for an application after some containers died. The details of the issue > I > > am running into are outlined in fluo#657 [1]. > > > > Twill seems to be trying to restart the containers, but it seems YARN is > > not doing it. Looking at the YARN RM web page there are enough cores > and > > memory available to start the containers, so I am not sure why its not > > starting them. > > > > Does anyone has any tips for debugging this issue or hve a second to look > > at the logs attached to fluo#657? > > > > [1] : https://github.com/fluo-io/fluo/issues/657 > > >
