On Wed, May 4, 2016 at 2:14 PM, Terence Yim <[email protected]> wrote:

> Hi Keith,
>
> What is the Hadoop version you are using? Judging from the log, it could be
> a bug in the Capacity scheduler[1].
>

I am using Hadoop 2.6.3.  So that bug should be fixed.


> Also, have you look at the node manager log of the node "worker14:40196"?
>

No I had not, thats a good idea.  I grepped that log for the yarn app id
1462212200762_0008 and saw nothing pertinent.  I also looked around the
time of the error message in the RM and saw nothing pertinent.


>
> [1] https://issues.apache.org/jira/browse/YARN-2628
>
> Terence
>
> On Wed, May 4, 2016 at 8:44 AM, Keith Turner <[email protected]> wrote:
>
> > I ran into an issue where Yarn does not seem to be starting container
> again
> > for an application after some containers died.  The details of the issue
> I
> > am running into are outlined in fluo#657 [1].
> >
> > Twill seems to be trying to restart the containers, but it seems YARN is
> > not doing it.   Looking at the YARN RM web page there are enough cores
> and
> > memory available to start the containers, so I am not sure why its not
> > starting them.
> >
> > Does anyone has any tips for debugging this issue or hve a second to look
> > at the logs attached to fluo#657?
> >
> > [1] : https://github.com/fluo-io/fluo/issues/657
> >
>

Reply via email to