Bjorn, You're isolation configuration is correct, I was going from memory. I'll take a look at you're configs a little later on my test environment and see what I can come up with.
Darin On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier <b.hageme...@fz-juelich.de > wrote: > Dear Darin, > > thanks for your response. > > The precise content of /etc/mesos-slave/isolation is: > > ================================================== > cgroups/cpu,cgroups/mem > ================================================== > > Which I took from some documentation, it may have been that of the > Puppet module I'm using [1]. Should the values be different? Your string > looks a bit different: "cpu/cgroups,memory/cgroups". > > Please find my yarn-site.xml and myriad-config-default.yml attached. I > don't think they contain any sensitive information. > > > Best regards, > Björn > > [1] https://github.com/deric/puppet-mesos > > Am 15.03.2016 um 16:46 schrieb Darin Johnson: > > Hey Bjorn, > > > > Can you copy paste the relevant part of the Myriad and yarn-site.xml? > > Also, can you ensure you are running the mesos-slave with > > --isolation="cpu/cgroups,memory/cgroups?. > > > > I'll try to recreate the problem and/or tell you what's missing in the > > config. > > > > Darin > > > > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier < > b.hageme...@fz-juelich.de> > > wrote: > > > >> Hi all, > >> > >> I have trouble starting the NM on the slave nodes. Apparently, it does > >> not find it's configuration or sth. is wrong with the configuration. > >> > >> With cgroups enabled, the NM does not start, the logs contain, > >> indicating that there is sth. wrong in the configuratin. However, > >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The > >> value used to be "${yarn.nodemanager.linux-container-executor.group}" as > >> indicated by the installation documentation, however I'm uncertain > >> whether this recursion is the correct approach. > >> > >> > >> ================================================== > >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting > NodeManager > >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > >> initialize container executor > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213) > >> at > >> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521) > >> Caused by: java.io.IOException: Linux container executor not configured > >> properly (error=24) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) > >> ... 3 more > >> Caused by: ExitCodeException exitCode=24: Can't get configured value for > >> yarn.nodemanager.linux-container-executor.group. > >> > >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > >> at org.apache.hadoop.util.Shell.run(Shell.java:460) > >> at > >> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187) > >> ... 4 more > >> ================================================== > >> > >> > >> I have given it another try with cgroups disabled (in > >> myriad-config-default.yml), I seem to get a little further, but still > >> stuck at running Yarn jobs: > >> > >> ================================================== > >> 16/03/14 10:56:34 INFO container.Container: Container > >> container_1457949199710_0001_01_000001 transitioned from LOCALIZED to > >> RUNNING > >> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor: > >> launchContainer: [bash, > >> > >> > /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh] > >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code > >> from container container_1457949199710_0001_01_000001 is : 1 > >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception > >> from container-launch with container ID: > >> container_1457949199710_0001_01_000001 and exit code: 1 > >> ExitCodeException exitCode=1: > >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) > >> at org.apache.hadoop.util.Shell.run(Shell.java:460) > >> at > >> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) > >> at > >> > >> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > >> at > >> > >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > >> at java.lang.Thread.run(Thread.java:745) > >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from > >> container-launch. > >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id: > >> container_1457949199710_0001_01_000001 > >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1 > >> ================================================== > >> > >> Unfortunately, directory > >> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/ > >> is empty, the log indicates that it is being deleted after the failed > >> attempt. > >> > >> Again, any hint would be useful. Also regarding the activation of > cgroups. > >> > >> > >> Best regards, > >> Björn > >> > >> -- > >> Dipl.-Inform. Björn Hagemeier > >> Federated Systems and Data > >> Juelich Supercomputing Centre > >> Institute for Advanced Simulation > >> > >> Phone: +49 2461 61 1584 > >> Fax : +49 2461 61 6656 > >> Email: b.hageme...@fz-juelich.de > >> Skype: bhagemeier > >> WWW : http://www.fz-juelich.de/jsc > >> > >> JSC is the coordinator of the > >> John von Neumann Institute for Computing > >> and member of the > >> Gauss Centre for Supercomputing > >> > >> > >> > ------------------------------------------------------------------------------------- > >> > >> > ------------------------------------------------------------------------------------- > >> Forschungszentrum Juelich GmbH > >> 52425 Juelich > >> Sitz der Gesellschaft: Juelich > >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > >> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher > >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > >> Prof. Dr. Sebastian M. Schmidt > >> > >> > ------------------------------------------------------------------------------------- > >> > >> > ------------------------------------------------------------------------------------- > >> > >> > > > > > -- > Dipl.-Inform. Björn Hagemeier > Federated Systems and Data > Juelich Supercomputing Centre > Institute for Advanced Simulation > > Phone: +49 2461 61 1584 > Fax : +49 2461 61 6656 > Email: b.hageme...@fz-juelich.de > Skype: bhagemeier > WWW : http://www.fz-juelich.de/jsc > > JSC is the coordinator of the > John von Neumann Institute for Computing > and member of the > Gauss Centre for Supercomputing > > > ------------------------------------------------------------------------------------- > > ------------------------------------------------------------------------------------- > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Prof. Dr. Sebastian M. Schmidt > > ------------------------------------------------------------------------------------- > > ------------------------------------------------------------------------------------- > >