Hey Bjorn,

I think I figured out the issue.  Some of the values for cgroups are still
hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an update
for 0.2.0.  I'll also respond to this thread after a pull request is
submitted in case you'd like to test it.

Darin
Hi all,

I have trouble starting the NM on the slave nodes. Apparently, it does
not find it's configuration or sth. is wrong with the configuration.

With cgroups enabled, the NM does not start, the logs contain,
indicating that there is sth. wrong in the configuratin. However,
yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
value used to be "${yarn.nodemanager.linux-container-executor.group}" as
indicated by the installation documentation, however I'm uncertain
whether this recursion is the correct approach.


==================================================
16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
        at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
        at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
        at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
        at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
Caused by: java.io.IOException: Linux container executor not configured
properly (error=24)
        at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
        at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
        ... 3 more
Caused by: ExitCodeException exitCode=24: Can't get configured value for
yarn.nodemanager.linux-container-executor.group.

        at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
        at org.apache.hadoop.util.Shell.run(Shell.java:460)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
        at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
        ... 4 more
==================================================


I have given it another try with cgroups disabled (in
myriad-config-default.yml), I seem to get a little further, but still
stuck at running Yarn jobs:

==================================================
16/03/14 10:56:34 INFO container.Container: Container
container_1457949199710_0001_01_000001 transitioned from LOCALIZED to
RUNNING
16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
launchContainer: [bash,
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh]
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
from container container_1457949199710_0001_01_000001 is : 1
16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
from container-launch with container ID:
container_1457949199710_0001_01_000001 and exit code: 1
ExitCodeException exitCode=1:
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
        at org.apache.hadoop.util.Shell.run(Shell.java:460)
        at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
        at
org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
        at
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
container-launch.
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
container_1457949199710_0001_01_000001
16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
==================================================

Unfortunately, directory
/var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
is empty, the log indicates that it is being deleted after the failed
attempt.

Again, any hint would be useful. Also regarding the activation of cgroups.


Best regards,
Björn

--
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hageme...@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------

Reply via email to