Sarjeet: Can you try adding this to your yarn-site.xml:
<property> <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name> <value>${yarn.nodemanager.linux-container-executor.cgroups.hierachy}</value> </property> this should change the hierarchy to /sys/fs/cgroup/cpu/mesos/XXX-TASK-ID-XXX, which will be rightable. and explains the error: Caused by: java.io.IOException: Not able to enforce cpu weights; cannot write to cgroup at: /sys/fs/cgroup/cpu The node manager will now add tasks to: /sys/fs/cgroup/cpu/mesos/XXX-TASK-ID-XXX I'll go check that to ensure that's in the documentation. Thanks, Darin On Sat, May 21, 2016 at 4:56 AM, Sarjeet Singh <sarjeetsi...@maprtech.com> wrote: > When trying cgroups on myriad-0.2 RC on a single node mapr cluster, I am > getting the following issue: > > 1. The below errors is when launching NodeManager with cgroups enabled: > > *stdout*: > > export TASK_DIR=afe954c5-79dc-4238-af84-14855090df34&& sudo chown mapr > /sys/fs/cgroup/cpu/mesos/afe954c5-79dc-4238-af84-14855090df34 && export > YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0; env > YARN_NODEMANAGER_OPTS=-Dcluster.name.prefix=/cluster1 > -Dnodemanager.resource.io-spindles=4.0 > -Dyarn.nodemanager.linux-container-executor.cgroups.hierarchy=mesos/ > afe954c5-79dc-4238-af84-14855090df34 > -Dyarn.home=/opt/mapr/hadoop/hadoop-2.7.0 > -Dnodemanager.resource.cpu-vcores=4 -Dnodemanager.resource.memory-mb=4096 > -Dmyriad.yarn.nodemanager.address=0.0.0.0:31847 > -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31132 > -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31181 > -Dmyriad.mapreduce.shuffle.port=31166 > YARN_HOME=/opt/mapr/hadoop/hadoop-2.7.0 > /opt/mapr/hadoop/hadoop-2.7.0/bin/yarn nodemanager > > > *stderr*: > > 16/05/21 01:43:13 INFO service.AbstractService: Service NodeManager failed > in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) > > Caused by: java.io.IOException: Not able to enforce cpu weights; cannot > write to cgroup at: /sys/fs/cgroup/cpu > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493) > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152) > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135) > > at > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > ... 3 more > > 16/05/21 01:43:13 WARN service.AbstractService: When stopping the service > NodeManager : java.lang.NullPointerException > > java.lang.NullPointerException > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:164) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:276) > > at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > > at > > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) > > 16/05/21 01:43:13 FATAL nodemanager.NodeManager: Error starting NodeManager > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:214) > > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:476) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:524) > > Caused by: java.io.IOException: Not able to enforce cpu weights; cannot > write to cgroup at: /sys/fs/cgroup/cpu > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.initializeControllerPaths(CgroupsLCEResourcesHandler.java:493) > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:152) > > at > > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler.init(CgroupsLCEResourcesHandler.java:135) > > at > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:192) > > at > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > ... 3 more > > 16/05/21 01:43:13 INFO nodemanager.NodeManager: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down NodeManager at qa101-139/10.10.101.139 > > ************************************************************/ > > Here is the yarn-site.xml configurations: > > <configuration> > > <!-- Site specific YARN configuration properties --> > > <property> > > <name>yarn.resourcemanager.hostname</name> > > <value>testrm.marathon.mesos</value> > > <description>host is the hostname of the resourcemanager</description> > > </property> > > <property> > > <name>yarn.resourcemanager.recovery.enabled</name> > > <value>true</value> > > <description>RM Recovery Enabled</description> > > </property> > > <property> > > <name>yarn.resourcemanager.scheduler.class</name> > > <value>org.apache.myriad.scheduler.yarn.MyriadFairScheduler</value> > > <description>One can configure other scehdulers as well from following > list: org.apache.myriad.scheduler.yarn.MyriadCapacityScheduler, > org.apache.myriad.scheduler.yarn.MyriadFifoScheduler</description> > > </property> > > <property> > > <name>yarn.nodemanager.resource.cpu-vcores</name> > > <value>${nodemanager.resource.cpu-vcores}</value> > > </property> > > <property> > > <name>yarn.nodemanager.resource.memory-mb</name> > > <value>${nodemanager.resource.memory-mb}</value> > > </property> > > <property> > > <name>yarn.nodemanager.address</name> > > <value>${myriad.yarn.nodemanager.address}</value> > > </property> > > <property> > > <name>yarn.nodemanager.webapp.address</name> > > <value>${myriad.yarn.nodemanager.webapp.address}</value> > > </property> > > <property> > > <name>yarn.nodemanager.webapp.https.address</name> > > <value>${myriad.yarn.nodemanager.webapp.address}</value> > > </property> > > <property> > > <name>yarn.nodemanager.localizer.address</name> > > <value>${myriad.yarn.nodemanager.localizer.address}</value> > > </property> > > <property> > > <name>mapreduce.shuffle.port</name> > > <value>${myriad.mapreduce.shuffle.port}</value> > > </property> > > <property> > > <name>yarn.nodemanager.aux-services</name> > > <value>mapreduce_shuffle,mapr_direct_shuffle,myriad_executor</value> > > </property> > > <property> > > <name>yarn.nodemanager.aux-services.myriad_executor.class</name> > > <value>org.apache.myriad.executor.MyriadExecutorAuxService</value> > > </property> > > <property> > > <name>yarn.resourcemanager.store.class</name> > > <value> > > org.apache.hadoop.yarn.server.resourcemanager.recovery.MyriadFileSystemRMStateStore > </value> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-mb</name> > > <value>0</value> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-vcores</name> > > <value>0</value> > > </property> > > <property> > > <name>yarn.scheduler.minimum-allocation-disks</name> > > <value>0</value> > > </property> > > <!-- Cgroups configuration --> > > <property> > > <description>who will execute(launch) the containers.</description> > > <name>yarn.nodemanager.container-executor.class</name> > > <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor > </value> > > </property> > > <property> > > <description>The class which should help the LCE handle resources. > </description> > > <name>yarn.nodemanager.linux-container-executor.resources-handler.class > </name> > > <value> > org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler > </value> > > </property> > > <property> > > <name>yarn.nodemanager.linux-container-executor.group</name> > > <value>mapr</value> > > </property> > > <property> > > <name>yarn.nodemanager.linux-container-executor.path</name> > > <value>/opt/mapr/hadoop/hadoop-2.7.0/bin/container-executor</value> > > </property> > > Here is the *myriad-config-default.yml*: > > mesosMaster: zk://10.10.101.139:5181/mesos > > checkpoint: false > > frameworkFailoverTimeout: 43200000 > > frameworkName: MyriadAlpha > > frameworkRole: > > frameworkUser: mapr > > # running the resource manager. > > frameworkSuperUser: root # To be depricated, currently permissions need > set by a superuser due to Mesos-1790. Must be > > # root or have passwordless sudo. Required if > nodeManagerURI set, ignored otherwise. > > nativeLibrary: /usr/local/lib/libmesos.so > > zkServers: 10.10.101.139:5181 > > zkTimeout: 20000 > > restApiPort: 8192 > > profiles: > > zero: # NMs launched with this profile dynamically obtain cpu/mem from > Mesos > > cpu: 0 > > mem: 0 > > spindles: 0 > > small: > > cpu: 2 > > mem: 2048 > > spindles: 1 > > medium: > > cpu: 4 > > mem: 4096 > > spindles: 2 > > large: > > cpu: 10 > > mem: 12288 > > spindles: 4 > > nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero > profile. > > medium: 1 # <profile_name : instances> > > rebalancer: false > > haEnabled: true > > nodemanager: > > jvmMaxMemoryMB: 1024 > > cpus: 0.2 > > cgroups: true > > executor: > > jvmMaxMemoryMB: 256 > > path: > file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar > > #The following should be used for a remotely distributed URI, hdfs > assumed but other URI types valid. > > #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz > > #path: > file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar > > yarnEnvironment: > > YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1 > -Dnodemanager.resource.io-spindles=4.0 > > YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0 > > #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes > necessary > > mesosAuthenticationPrincipal: > > mesosAuthenticationSecretFilename: > > services: > > jobhistory: > > jvmMaxMemoryMB: 64 > > cpus: 0.5 > > ports: > > myriad.mapreduce.jobhistory.admin.address: 10033 > > myriad.mapreduce.jobhistory.address: 10020 > > myriad.mapreduce.jobhistory.webapp.address: 19888 > > envSettings: -Dcluster.name.prefix=/cluster1 > > taskName: jobhistory > > serviceOptsName: HADOOP_JOB_HISTORYSERVER_OPTS > > command: $YARN_HOME/bin/mapred historyserver > > maxInstances: 1 > > > Though, I have fixed some NMExecutorCLGenImpl.java for the NM commandline, > but still the issue remains same. Let me know if there is any issue with > the setup or missed any configuration details from myriad perspective. > > -Sarjeet >