what does your container-executor.cfg look like?  Seems like
yarn.nodemanager.linux-container-executor.group isn't set, or possibly
bannerusers= hasn't been set (some distro's).

On Tue, Mar 15, 2016 at 12:52 PM, Darin Johnson <dbjohnson1...@gmail.com>
wrote:

> Bjorn,
>
> You're isolation configuration is correct, I was going from memory.  I'll
> take a look at you're configs a little later on my test environment and see
> what I can come up with.
>
> Darin
>
> On Tue, Mar 15, 2016 at 12:07 PM, Björn Hagemeier <
> b.hageme...@fz-juelich.de> wrote:
>
>> Dear Darin,
>>
>> thanks for your response.
>>
>> The precise content of /etc/mesos-slave/isolation is:
>>
>> ==================================================
>> cgroups/cpu,cgroups/mem
>> ==================================================
>>
>> Which I took from some documentation, it may have been that of the
>> Puppet module I'm using [1]. Should the values be different? Your string
>> looks a bit different: "cpu/cgroups,memory/cgroups".
>>
>> Please find my yarn-site.xml and myriad-config-default.yml attached. I
>> don't think they contain any sensitive information.
>>
>>
>> Best regards,
>> Björn
>>
>> [1] https://github.com/deric/puppet-mesos
>>
>> Am 15.03.2016 um 16:46 schrieb Darin Johnson:
>> > Hey Bjorn,
>> >
>> > Can you copy paste the relevant part of the Myriad and yarn-site.xml?
>> > Also, can you ensure you are running the mesos-slave with
>> > --isolation="cpu/cgroups,memory/cgroups?.
>> >
>> > I'll try to recreate the problem and/or tell you what's missing in the
>> > config.
>> >
>> > Darin
>> >
>> > On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <
>> b.hageme...@fz-juelich.de>
>> > wrote:
>> >
>> >> Hi all,
>> >>
>> >> I have trouble starting the NM on the slave nodes. Apparently, it does
>> >> not find it's configuration or sth. is wrong with the configuration.
>> >>
>> >> With cgroups enabled, the NM does not start, the logs contain,
>> >> indicating that there is sth. wrong in the configuratin. However,
>> >> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> >> value used to be "${yarn.nodemanager.linux-container-executor.group}"
>> as
>> >> indicated by the installation documentation, however I'm uncertain
>> >> whether this recursion is the correct approach.
>> >>
>> >>
>> >> ==================================================
>> >> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting
>> NodeManager
>> >> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> >> initialize container executor
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>> >>         at
>> >>
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> >> Caused by: java.io.IOException: Linux container executor not configured
>> >> properly (error=24)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>> >>         ... 3 more
>> >> Caused by: ExitCodeException exitCode=24: Can't get configured value
>> for
>> >> yarn.nodemanager.linux-container-executor.group.
>> >>
>> >>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> >>         at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> >>         at
>> >>
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>> >>         ... 4 more
>> >> ==================================================
>> >>
>> >>
>> >> I have given it another try with cgroups disabled (in
>> >> myriad-config-default.yml), I seem to get a little further, but still
>> >> stuck at running Yarn jobs:
>> >>
>> >> ==================================================
>> >> 16/03/14 10:56:34 INFO container.Container: Container
>> >> container_1457949199710_0001_01_000001 transitioned from LOCALIZED to
>> >> RUNNING
>> >> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> >> launchContainer: [bash,
>> >>
>> >>
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh]
>> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> >> from container container_1457949199710_0001_01_000001 is : 1
>> >> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
>> >> from container-launch with container ID:
>> >> container_1457949199710_0001_01_000001 and exit code: 1
>> >> ExitCodeException exitCode=1:
>> >>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>> >>         at org.apache.hadoop.util.Shell.run(Shell.java:460)
>> >>         at
>> >>
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>> >>         at
>> >>
>> >>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>> >>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> >>         at
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>> >>         at
>> >>
>> >>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>> >>         at java.lang.Thread.run(Thread.java:745)
>> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
>> >> container-launch.
>> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
>> >> container_1457949199710_0001_01_000001
>> >> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
>> >> ==================================================
>> >>
>> >> Unfortunately, directory
>> >>
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
>> >> is empty, the log indicates that it is being deleted after the failed
>> >> attempt.
>> >>
>> >> Again, any hint would be useful. Also regarding the activation of
>> cgroups.
>> >>
>> >>
>> >> Best regards,
>> >> Björn
>> >>
>> >> --
>> >> Dipl.-Inform. Björn Hagemeier
>> >> Federated Systems and Data
>> >> Juelich Supercomputing Centre
>> >> Institute for Advanced Simulation
>> >>
>> >> Phone: +49 2461 61 1584
>> >> Fax  : +49 2461 61 6656
>> >> Email: b.hageme...@fz-juelich.de
>> >> Skype: bhagemeier
>> >> WWW  : http://www.fz-juelich.de/jsc
>> >>
>> >> JSC is the coordinator of the
>> >> John von Neumann Institute for Computing
>> >> and member of the
>> >> Gauss Centre for Supercomputing
>> >>
>> >>
>> >>
>> -------------------------------------------------------------------------------------
>> >>
>> >>
>> -------------------------------------------------------------------------------------
>> >> Forschungszentrum Juelich GmbH
>> >> 52425 Juelich
>> >> Sitz der Gesellschaft: Juelich
>> >> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> >> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
>> >> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>> >> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> >> Prof. Dr. Sebastian M. Schmidt
>> >>
>> >>
>> -------------------------------------------------------------------------------------
>> >>
>> >>
>> -------------------------------------------------------------------------------------
>> >>
>> >>
>> >
>>
>>
>> --
>> Dipl.-Inform. Björn Hagemeier
>> Federated Systems and Data
>> Juelich Supercomputing Centre
>> Institute for Advanced Simulation
>>
>> Phone: +49 2461 61 1584
>> Fax  : +49 2461 61 6656
>> Email: b.hageme...@fz-juelich.de
>> Skype: bhagemeier
>> WWW  : http://www.fz-juelich.de/jsc
>>
>> JSC is the coordinator of the
>> John von Neumann Institute for Computing
>> and member of the
>> Gauss Centre for Supercomputing
>>
>>
>> -------------------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------------------
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>>
>> -------------------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------------------
>>
>>
>

Reply via email to