Hey, Bjorn sorry for the delay, looking at the difference between the exceptions and my own experience I believe you left some cgroup configs in yarn-site.xml of the node manager. On Mar 18, 2016 2:58 AM, "Björn Hagemeier" <b.hageme...@fz-juelich.de> wrote:
> Hi Darin, > > thanks a lot for this. But what about the other case below, when cgroups > is disabled? > > > Björn > > Am 18.03.2016 um 00:25 schrieb Darin Johnson: > > Hey Bjorn, > > > > I think I figured out the issue. Some of the values for cgroups are > still > > hardcoded in myriad. I'll add a JIRA Ticket hopefully we can get an > update > > for 0.2.0. I'll also respond to this thread after a pull request is > > submitted in case you'd like to test it. > > > > Darin > > Hi all, > > > > I have trouble starting the NM on the slave nodes. Apparently, it does > > not find it's configuration or sth. is wrong with the configuration. > > > > With cgroups enabled, the NM does not start, the logs contain, > > indicating that there is sth. wrong in the configuratin. However, > > yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The > > value used to be "${yarn.nodemanager.linux-container-executor.group}" as > > indicated by the installation documentation, however I'm uncertain > > whether this recursion is the correct approach. > > > > > > ================================================== > > 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting > NodeManager > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521) > > Caused by: java.io.IOException: Linux container executor not configured > > properly (error=24) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: Can't get configured value for > > yarn.nodemanager.linux-container-executor.group. > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > > at org.apache.hadoop.util.Shell.run(Shell.java:460) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187) > > ... 4 more > > ================================================== > > > > > > I have given it another try with cgroups disabled (in > > myriad-config-default.yml), I seem to get a little further, but still > > stuck at running Yarn jobs: > > > > ================================================== > > 16/03/14 10:56:34 INFO container.Container: Container > > container_1457949199710_0001_01_000001 transitioned from LOCALIZED to > > RUNNING > > 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor: > > launchContainer: [bash, > > > /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh] > > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code > > from container container_1457949199710_0001_01_000001 is : 1 > > 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception > > from container-launch with container ID: > > container_1457949199710_0001_01_000001 and exit code: 1 > > ExitCodeException exitCode=1: > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > > at org.apache.hadoop.util.Shell.run(Shell.java:460) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > > at > > > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > > at > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from > > container-launch. > > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id: > > container_1457949199710_0001_01_000001 > > 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1 > > ================================================== > > > > Unfortunately, directory > > /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/ > > is empty, the log indicates that it is being deleted after the failed > > attempt. > > > > Again, any hint would be useful. Also regarding the activation of > cgroups. > > > > > > Best regards, > > Björn > > > > -- > > Dipl.-Inform. Björn Hagemeier > > Federated Systems and Data > > Juelich Supercomputing Centre > > Institute for Advanced Simulation > > > > Phone: +49 2461 61 1584 > > Fax : +49 2461 61 6656 > > Email: b.hageme...@fz-juelich.de > > Skype: bhagemeier > > WWW : http://www.fz-juelich.de/jsc > > > > JSC is the coordinator of the > > John von Neumann Institute for Computing > > and member of the > > Gauss Centre for Supercomputing > > > > > ------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------------- > > Forschungszentrum Juelich GmbH > > 52425 Juelich > > Sitz der Gesellschaft: Juelich > > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher > > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > > Prof. Dr. Sebastian M. Schmidt > > > ------------------------------------------------------------------------------------- > > > ------------------------------------------------------------------------------------- > > > > > -- > Dipl.-Inform. Björn Hagemeier > Federated Systems and Data > Juelich Supercomputing Centre > Institute for Advanced Simulation > > Phone: +49 2461 61 1584 > Fax : +49 2461 61 6656 > Email: b.hageme...@fz-juelich.de > Skype: bhagemeier > WWW : http://www.fz-juelich.de/jsc > > JSC is the coordinator of the > John von Neumann Institute for Computing > and member of the > Gauss Centre for Supercomputing > > > ------------------------------------------------------------------------------------- > > ------------------------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > > ------------------------------------------------------------------------------------- > > ------------------------------------------------------------------------------------- > >