David,

When local disks on the host running node manager are more than 90% full,
nodemanager gives message like  "10/12 local-dirs are bad:". In such cases,
the node manager service keeps running but is not servicing any
applications.

Check if the host had multiple disk more than 90% full.

Hope this helps !

Manoj

On Tue, Apr 3, 2018 at 10:59 PM, Gour Saha <gs...@hortonworks.com> wrote:

> Can you check the slider agent logs and the application logs in those
> containers to see if they are failing with some exception?
>
> The fishy thing I found in the AM log are messages like these saying
> "local-dirs are bad". Can you check what's going on with these dirs.?
>
> 2018-04-03 18:38:28,200 [AMRM Callback Handler Thread] INFO
> appmaster.SliderAppMaster - onNodesUpdated(1)
> 2018-04-03 18:38:28,376 [AMRM Callback Handler Thread] INFO
> appmaster.SliderAppMaster - Updated nodes [nodeId { host: "***" port: 45454
> } httpAddress: "***:8042" rackName: "/EI105" used { memory: 0
> virtual_cores: 0 } capability { memory: 364544 virtual_cores: 38 }
> node_state: NS_UNHEALTHY health_report: "10/12 local-dirs are bad:
> /grid/9/hadoop/yarn/local,/grid/2/hadoop/yarn/local,/
> grid/1/hadoop/yarn/local,/grid/5/hadoop/yarn/local,/
> grid/11/hadoop/yarn/local,/grid/3/hadoop/yarn/local,/
> grid/8/hadoop/yarn/local,/grid/6/hadoop/yarn/local,/
> grid/0/hadoop/yarn/local,/grid/7/hadoop/yarn/local; 10/12 log-dirs are
> bad: /grid/6/hadoop/yarn/log,/grid/8/hadoop/yarn/log,/grid/2/
> hadoop/yarn/log,/grid/1/hadoop/yarn/log,/grid/5/hadoop/yarn/log,/grid/11/
> hadoop/yarn/log,/grid/7/hadoop/yarn/log,/grid/9/hadoop/yarn/log,/grid/0/
> hadoop/yarn/log,/grid/3/hadoop/yarn/log" last_health_report_time:
> 1522798707678]
>
> -Gour
>
> On 4/3/18, 10:49 PM, "David.Serafini" <david.seraf...@target.com> wrote:
>
>     I've attached what I can find.
>
>
>     On 4/3/18, 10:38 PM, Gour Saha <gs...@hortonworks.com> wrote:
>
>         Can you share the logs of the dying containers and the AM to debug
> further?
>
>         -Gour
>
>         On 4/3/18, 6:49 PM, "David.Serafini" <david.seraf...@target.com>
> wrote:
>
>             I've been using slider 0.91 for a year and it's been very
> stable lately.
>             I built 0.92 to test it and my yarn containers are dying after
> 10 minutes.
>             Slider restarts them successfully, but this isn't acceptable
> behavior.
>             Any thoughts on what could be going on?
>
>             I looked for some kind of release notes for 0.92, but didn't
> find anything except a list of ticket ids.
>             Is there some configuration in my job that I should have
> changed to use 0.92?
>
>             Thanks,
>             -david
>
>
>
>
>
>
>
>
>

Reply via email to