Dear Darin,

thanks for your response.

The precise content of /etc/mesos-slave/isolation is:

==================================================
cgroups/cpu,cgroups/mem
==================================================

Which I took from some documentation, it may have been that of the
Puppet module I'm using [1]. Should the values be different? Your string
looks a bit different: "cpu/cgroups,memory/cgroups".

Please find my yarn-site.xml and myriad-config-default.yml attached. I
don't think they contain any sensitive information.


Best regards,
Björn

[1] https://github.com/deric/puppet-mesos

Am 15.03.2016 um 16:46 schrieb Darin Johnson:
> Hey Bjorn,
> 
> Can you copy paste the relevant part of the Myriad and yarn-site.xml?
> Also, can you ensure you are running the mesos-slave with
> --isolation="cpu/cgroups,memory/cgroups?.
> 
> I'll try to recreate the problem and/or tell you what's missing in the
> config.
> 
> Darin
> 
> On Mon, Mar 14, 2016 at 6:19 AM, Björn Hagemeier <b.hageme...@fz-juelich.de>
> wrote:
> 
>> Hi all,
>>
>> I have trouble starting the NM on the slave nodes. Apparently, it does
>> not find it's configuration or sth. is wrong with the configuration.
>>
>> With cgroups enabled, the NM does not start, the logs contain,
>> indicating that there is sth. wrong in the configuratin. However,
>> yarn.nodemanager.linux-container-executor.group is set (to "yarn"). The
>> value used to be "${yarn.nodemanager.linux-container-executor.group}" as
>> indicated by the installation documentation, however I'm uncertain
>> whether this recursion is the correct approach.
>>
>>
>> ==================================================
>> 16/03/14 09:32:45 FATAL nodemanager.NodeManager: Error starting NodeManager
>> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
>> initialize container executor
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:213)
>>         at
>> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
>> Caused by: java.io.IOException: Linux container executor not configured
>> properly (error=24)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:193)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
>>         ... 3 more
>> Caused by: ExitCodeException exitCode=24: Can't get configured value for
>> yarn.nodemanager.linux-container-executor.group.
>>
>>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>>         at org.apache.hadoop.util.Shell.run(Shell.java:460)
>>         at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
>>         ... 4 more
>> ==================================================
>>
>>
>> I have given it another try with cgroups disabled (in
>> myriad-config-default.yml), I seem to get a little further, but still
>> stuck at running Yarn jobs:
>>
>> ==================================================
>> 16/03/14 10:56:34 INFO container.Container: Container
>> container_1457949199710_0001_01_000001 transitioned from LOCALIZED to
>> RUNNING
>> 16/03/14 10:56:34 INFO nodemanager.DefaultContainerExecutor:
>> launchContainer: [bash,
>>
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/application_1457949199710_0001/container_1457949199710_0001_01_000001/default_container_executor.sh]
>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exit code
>> from container container_1457949199710_0001_01_000001 is : 1
>> 16/03/14 10:56:34 WARN nodemanager.DefaultContainerExecutor: Exception
>> from container-launch with container ID:
>> container_1457949199710_0001_01_000001 and exit code: 1
>> ExitCodeException exitCode=1:
>>         at org.apache.hadoop.util.Shell.runCommand(Shell.java:543)
>>         at org.apache.hadoop.util.Shell.run(Shell.java:460)
>>         at
>> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:720)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:210)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>>         at
>>
>> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>>         at
>>
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>>         at java.lang.Thread.run(Thread.java:745)
>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exception from
>> container-launch.
>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Container id:
>> container_1457949199710_0001_01_000001
>> 16/03/14 10:56:34 INFO nodemanager.ContainerExecutor: Exit code: 1
>> ==================================================
>>
>> Unfortunately, directory
>> /var/lib/hadoop-yarn/cache/yarn/nm-local-dir/usercache/bjoernh/appcache/
>> is empty, the log indicates that it is being deleted after the failed
>> attempt.
>>
>> Again, any hint would be useful. Also regarding the activation of cgroups.
>>
>>
>> Best regards,
>> Björn
>>
>> --
>> Dipl.-Inform. Björn Hagemeier
>> Federated Systems and Data
>> Juelich Supercomputing Centre
>> Institute for Advanced Simulation
>>
>> Phone: +49 2461 61 1584
>> Fax  : +49 2461 61 6656
>> Email: b.hageme...@fz-juelich.de
>> Skype: bhagemeier
>> WWW  : http://www.fz-juelich.de/jsc
>>
>> JSC is the coordinator of the
>> John von Neumann Institute for Computing
>> and member of the
>> Gauss Centre for Supercomputing
>>
>>
>> -------------------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------------------
>> Forschungszentrum Juelich GmbH
>> 52425 Juelich
>> Sitz der Gesellschaft: Juelich
>> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
>> Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
>> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
>> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
>> Prof. Dr. Sebastian M. Schmidt
>>
>> -------------------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------------------
>>
>>
> 


-- 
Dipl.-Inform. Björn Hagemeier
Federated Systems and Data
Juelich Supercomputing Centre
Institute for Advanced Simulation

Phone: +49 2461 61 1584
Fax  : +49 2461 61 6656
Email: b.hageme...@fz-juelich.de
Skype: bhagemeier
WWW  : http://www.fz-juelich.de/jsc

JSC is the coordinator of the
John von Neumann Institute for Computing
and member of the
Gauss Centre for Supercomputing

-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
-------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------

Attachment: myriad-config-default.yml
Description: application/yaml

<?xml version="1.0"?>
<!--
  Licensed to the Apache Software Foundation (ASF) under one or more
  contributor license agreements.  See the NOTICE file distributed with
  this work for additional information regarding copyright ownership.
  The ASF licenses this file to You under the Apache License, Version 2.0
  (the "License"); you may not use this file except in compliance with
  the License.  You may obtain a copy of the License at

      http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License.
-->
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

  <!-- List of directories to store localized files in. -->
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/var/lib/hadoop-yarn/cache/${user.name}/nm-local-dir</value>
  </property>
  <!-- Classpath for typical applications. -->
  <property>
    <name>yarn.application.classpath</name>
    <value>
        $HADOOP_CONF_DIR,
        $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
        $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
        $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
        $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*
</value>
  </property>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>judge080.judge</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle,myriad_executor</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
  </property>
  <!-- Where to store container logs -->
  <property>
    <name>yarn.nodemanager.log-dirs</name>
    <value>file:///var/log/hadoop-yarn/containers</value>
  </property>
  <!-- permitted nodes -->
  <property>
    <name>yarn.resourcemanager.nodes.include-path</name>
    <value>/etc/hadoop/conf/slaves</value>
  </property>
  <!-- decommissioning of the nodes -->
  <property>
    <name>yarn.resourcemanager.nodes.exclude-path</name>
    <value>/etc/hadoop/conf/excludes</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.myriad_executor.class</name>
    <value>org.apache.myriad.executor.MyriadExecutorAuxService</value>
  </property>
  <property>
    <name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
    <value>2000</value>
  </property>
  <property>
    <name>yarn.am.liveness-monitor.expiry-interval-ms</name>
    <value>10000</value>
  </property>
  <property>
    <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>
    <value>1000</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>${nodemanager.resource.cpu-vcores}</value>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>${nodemanager.resource.memory-mb}</value>
  </property>
  <property>
    <name>yarn.nodemanager.address</name>
    <value>${myriad.yarn.nodemanager.webapp.address}</value>
  </property>
  <property>
    <name>yarn.nodemanager.localizer.address</name>
    <value>${myriad.yarn.nodemanager.localizer.address}</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.class</name>
    <value>org.apache.myriad.scheduler.yarn.MyriadFairScheduler</value>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>0</value>
  </property>
  <property>
    <name>yarn.nodemanager.container-executor.class</name>
    <value>${yarn.nodemanager.container-executor.class}</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name>
    <value>${yarn.nodemanager.linux-container-executor.resources-handler.class}</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
    <value>${yarn.nodemanager.linux-container-executor.cgroups.hierarchy}</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
    <value>${yarn.nodemanager.linux-container-executor.cgroups.mount}</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
    <value>${yarn.nodemanager.linux-container-executor.cgroups.mount-path}</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.group</name>
    <value>yarn</value>
  </property>
  <property>
    <name>yarn.nodemanager.linux-container-executor.path</name>
    <value>${yarn.home}/bin/container-executor</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>24</value>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>1024</value>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>96508</value>
  </property>

</configuration>

<<attachment: b_hagemeier.vcf>>

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to