Hey, Bjorn sorry for the delay, looking at the difference between the
exceptions and my own experience I believe you left some cgroup configs in
yarn-site.xml of the node manager.
On Mar 18, 2016 2:58 AM, "Björn Hagemeier" <b.hageme...@fz-juelich.de>
wrote:

> Hi Darin,
>
> thanks a lot for this. But what about the other case below, when cgroups
> is disabled?
>
>
> Björn
>
> Am 18.03.2016 um 00:25 schrieb Darin Johnson:
> > Hey Bjorn,
> >
> > I think I figured out the issue.  Some of the values for cgroups are
> still
> > hardcoded in myriad.  I'll add a JIRA Ticket hopefully we can get an
> update
> > for 0.2.0.  I'll also respond to this thread after a pull request is
> > submitted in case you'd like to test it.
> >
> > Darin
> > Hi all,
> >
> > I have trouble starting the NM on the slave nodes. Apparently, it does
> > not find it's configuration or sth. is wrong with the configuration.
> >
> > With cgroups enabled, the NM does not start, the logs contain,
> > indicating that there is sth. wrong in the configuratin. However,
> > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
> > value used to be "${yarn.nodemanager.linux-container-executor.group}" as
> > indicated by the installation documentation, however I'm uncertain
> > whether this recursion is the correct approach.
> >
> >
> > ==================================================
> > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
> >         at
> > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
> > Caused by: java.io.IOException: Linux container executor not configured
> > properly (error=24)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
> >         ... 3 more
> > Caused by: ExitCodeException exitCode=24: Can't get configured value for
> > yarn.nodemanager.linux-container-executor.group.
> >
> >         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> >         at org.apache.hadoop.util.Shell.run(Shell.java:460)
> >         at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
> >         ... 4 more
> > ==================================================
> >
> >
> > I have given it another try with cgroups disabled (in
> > myriad-config-default.yml), I seem to get a little further, but still
> > stuck at running Yarn jobs:
> >
> > ==================================================
> > 16/03/14 10:56:34 INFO container.Container: Container
> > container_1457949199710_0001_01_000001 transitioned from LOCALIZED to
> > RUNNING
> > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
> > launchContainer: [bash,
> >
> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh]
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
> > from container container_1457949199710_0001_01_000001 is : 1
> > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
> > from container-launch with container ID:
> > container_1457949199710_0001_01_000001 and exit code: 1
> > ExitCodeException exitCode=1:
> >         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
> >         at org.apache.hadoop.util.Shell.run(Shell.java:460)
> >         at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
> >         at
> >
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> >         at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> >         at java.lang.Thread.run(Thread.java:745)
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
> > container-launch.
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
> > container_1457949199710_0001_01_000001
> > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
> > ==================================================
> >
> > Unfortunately, directory
> > /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
> > is empty, the log indicates that it is being deleted after the failed
> > attempt.
> >
> > Again, any hint would be useful. Also regarding the activation of
> cgroups.
> >
> >
> > Best regards,
> > Björn
> >
> > --
> > Dipl.-Inform. Björn Hagemeier
> > Federated Systems and Data
> > Juelich Supercomputing Centre
> > Institute for Advanced Simulation
> >
> > Phone: +49 2461 61 1584
> > Fax  : +49 2461 61 6656
> > Email: b.hageme...@fz-juelich.de
> > Skype: bhagemeier
> > WWW  : http://www.fz-juelich.de/jsc
> >
> > JSC is the coordinator of the
> > John von Neumann Institute for Computing
> > and member of the
> > Gauss Centre for Supercomputing
> >
> >
> -------------------------------------------------------------------------------------
> >
> -------------------------------------------------------------------------------------
> > Forschungszentrum Juelich GmbH
> > 52425 Juelich
> > Sitz der Gesellschaft: Juelich
> > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> > Prof. Dr. Sebastian M. Schmidt
> >
> -------------------------------------------------------------------------------------
> >
> -------------------------------------------------------------------------------------
> >
>
>
> --
> Dipl.-Inform. Björn Hagemeier
> Federated Systems and Data
> Juelich Supercomputing Centre
> Institute for Advanced Simulation
>
> Phone: +49 2461 61 1584
> Fax  : +49 2461 61 6656
> Email: b.hageme...@fz-juelich.de
> Skype: bhagemeier
> WWW  : http://www.fz-juelich.de/jsc
>
> JSC is the coordinator of the
> John von Neumann Institute for Computing
> and member of the
> Gauss Centre for Supercomputing
>
>
> -------------------------------------------------------------------------------------
>
> -------------------------------------------------------------------------------------
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Prof. Dr. Sebastian M. Schmidt
>
> -------------------------------------------------------------------------------------
>
> -------------------------------------------------------------------------------------
>
>

Reply via email to