what does your container-executor.cfg look like? Seems like yarn.nodemanager.linux-container-executor.group isn't set, or possibly bannerusers= hasn't been set (some distro's).
On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson <dbjohnson1...@gmail.com> wrote: > Bjorn, > > You're isolation configuration is correct, I was going from memory. I'll > take a look at you're configs a little later on my test environment and see > what I can come up with. > > Darin > > On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier < > b.hageme...@fz-juelich.de> wrote: > >> Dear Darin, >> >> thanks for your response. >> >> The precise content of /etc/mesos-slave/isolation is: >> >> ================================================== >> cgroups/cpu,cgroups/mem >> ================================================== >> >> Which I took from some documentation, it may have been that of the >> Puppet module I'm using [1]. Should the values be different? Your string >> looks a bit different: "cpu/cgroups,memory/cgroups". >> >> Please find my yarn-site.xml and myriad-config-default.yml attached. I >> don't think they contain any sensitive information. >> >> >> Best regards, >> Björn >> >> [1] https://github.com/deric/puppet-mesos >> >> Am 15.03.2016 um 16:46 schrieb Darin Johnson: >> > Hey Bjorn, >> > >> > Can you copy paste the relevant part of the Myriad and yarn-site.xml? >> > Also, can you ensure you are running the mesos-slave with >> > --isolation="cpu/cgroups,memory/cgroups?. >> > >> > I'll try to recreate the problem and/or tell you what's missing in the >> > config. >> > >> > Darin >> > >> > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier < >> b.hageme...@fz-juelich.de> >> > wrote: >> > >> >> Hi all, >> >> >> >> I have trouble starting the NM on the slave nodes. Apparently, it does >> >> not find it's configuration or sth. is wrong with the configuration. >> >> >> >> With cgroups enabled, the NM does not start, the logs contain, >> >> indicating that there is sth. wrong in the configuratin. However, >> >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The >> >> value used to be "${yarn.nodemanager.linux-container-executor.group}" >> as >> >> indicated by the installation documentation, however I'm uncertain >> >> whether this recursion is the correct approach. >> >> >> >> >> >> ================================================== >> >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting >> NodeManager >> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to >> >> initialize container executor >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213) >> >> at >> >> >> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521) >> >> Caused by: java.io.IOException: Linux container executor not configured >> >> properly (error=24) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211) >> >> ... 3 more >> >> Caused by: ExitCodeException exitCode=24: Can't get configured value >> for >> >> yarn.nodemanager.linux-container-executor.group. >> >> >> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) >> >> at org.apache.hadoop.util.Shell.run(Shell.java:460) >> >> at >> >> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187) >> >> ... 4 more >> >> ================================================== >> >> >> >> >> >> I have given it another try with cgroups disabled (in >> >> myriad-config-default.yml), I seem to get a little further, but still >> >> stuck at running Yarn jobs: >> >> >> >> ================================================== >> >> 16/03/14 10:56:34 INFO container.Container: Container >> >> container_1457949199710_0001_01_000001 transitioned from LOCALIZED to >> >> RUNNING >> >> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor: >> >> launchContainer: [bash, >> >> >> >> >> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh] >> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code >> >> from container container_1457949199710_0001_01_000001 is : 1 >> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception >> >> from container-launch with container ID: >> >> container_1457949199710_0001_01_000001 and exit code: 1 >> >> ExitCodeException exitCode=1: >> >> at org.apache.hadoop.util.Shell.runCommand(Shell.java:543) >> >> at org.apache.hadoop.util.Shell.run(Shell.java:460) >> >> at >> >> >> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) >> >> at >> >> >> >> >> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) >> >> at java.util.concurrent.FutureTask.run(FutureTask.java:266) >> >> at >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) >> >> at >> >> >> >> >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) >> >> at java.lang.Thread.run(Thread.java:745) >> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from >> >> container-launch. >> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id: >> >> container_1457949199710_0001_01_000001 >> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1 >> >> ================================================== >> >> >> >> Unfortunately, directory >> >> >> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/ >> >> is empty, the log indicates that it is being deleted after the failed >> >> attempt. >> >> >> >> Again, any hint would be useful. Also regarding the activation of >> cgroups. >> >> >> >> >> >> Best regards, >> >> Björn >> >> >> >> -- >> >> Dipl.-Inform. Björn Hagemeier >> >> Federated Systems and Data >> >> Juelich Supercomputing Centre >> >> Institute for Advanced Simulation >> >> >> >> Phone: +49 2461 61 1584 >> >> Fax : +49 2461 61 6656 >> >> Email: b.hageme...@fz-juelich.de >> >> Skype: bhagemeier >> >> WWW : http://www.fz-juelich.de/jsc >> >> >> >> JSC is the coordinator of the >> >> John von Neumann Institute for Computing >> >> and member of the >> >> Gauss Centre for Supercomputing >> >> >> >> >> >> >> ------------------------------------------------------------------------------------- >> >> >> >> >> ------------------------------------------------------------------------------------- >> >> Forschungszentrum Juelich GmbH >> >> 52425 Juelich >> >> Sitz der Gesellschaft: Juelich >> >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >> >> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher >> >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >> >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, >> >> Prof. Dr. Sebastian M. Schmidt >> >> >> >> >> ------------------------------------------------------------------------------------- >> >> >> >> >> ------------------------------------------------------------------------------------- >> >> >> >> >> > >> >> >> -- >> Dipl.-Inform. Björn Hagemeier >> Federated Systems and Data >> Juelich Supercomputing Centre >> Institute for Advanced Simulation >> >> Phone: +49 2461 61 1584 >> Fax : +49 2461 61 6656 >> Email: b.hageme...@fz-juelich.de >> Skype: bhagemeier >> WWW : http://www.fz-juelich.de/jsc >> >> JSC is the coordinator of the >> John von Neumann Institute for Computing >> and member of the >> Gauss Centre for Supercomputing >> >> >> ------------------------------------------------------------------------------------- >> >> ------------------------------------------------------------------------------------- >> Forschungszentrum Juelich GmbH >> 52425 Juelich >> Sitz der Gesellschaft: Juelich >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 >> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, >> Prof. Dr. Sebastian M. Schmidt >> >> ------------------------------------------------------------------------------------- >> >> ------------------------------------------------------------------------------------- >> >> >