So thanks everyone on the NMInstances bug. I am not getting a different
issue with Myriad in that I have a permissions error with the remote tar
ball distribution.
In my old setup (Hadoop 2.5.0, MapR 4.1, some preincubator version of
Myriad)
I would run with the config having
nodemanager:
jvmMaxMemoryMB: 1024 # Xmx for NM JVM process.
user: mapr # The user to run NM process as.
cpus: 0.2 # CPU needed by NM process.
cgroups: false # Whether NM should support CGroups. If set to
'true', myriad automatically
# configures yarn-site.xml to attach YARN's cgroups
under Me
So user: mapr. Now, I realized that this no longer works in the verion I
just cloned, the error message was clear to me that this was no longer an
acceptable item.
frameworkUser: mapr # Should be the same user running the resource manager.
frameworkSuperUser: darkness # Must be root or have passwordless sudo on
all nodes!
So these are the settings I use, also, I run marathon with "user": "mapr"
(the resource manager).
So I see three different places to set users. darkness is a Superuser with
passwordless Sudo as requested. mapr is my cluster user, and mapr worked
before, and I run the resource manager as that user in marathon. Myriad
spins up fine, but then when it tries to kick off a nodemanager, I get the
error below. Note, user 700 is the mapr user.
Any thoughts on who I should run this as would be appreciated!
STARTUP_MSG: build = [email protected]:mapr/private-hadoop-common.git
-r 5264b1d5c5c2a849ee0eb09cfcbbed19fb0bfb53; compiled by 'root' on
2015-07-02T23:46Z
STARTUP_MSG: java = 1.8.0_45-internal
************************************************************/
15/08/19 07:01:09 INFO nodemanager.NodeManager: registered UNIX signal
handlers for [TERM, HUP, INT]
15/08/19 07:01:10 WARN nodemanager.LinuxContainerExecutor: Exit code
from container executor initialization is : 24
ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150818-152209-1677764800-5050-22280-S2/frameworks/20150818-152209-1677764800-5050-22280-0000/executors/myriad_executor20150818-152209-1677764800-5050-22280-S2/runs/1d06f1f1-02a9-4413-80de-7393e9e0935e
must be owned by root, but is owned by 700
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/08/19 07:01:10 INFO nodemanager.ContainerExecutor:
15/08/19 07:01:10 INFO service.AbstractService: Service NodeManager
failed in state INITED; cause:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not
configured properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150818-152209-1677764800-5050-22280-S2/frameworks/20150818-152209-1677764800-5050-22280-0000/executors/myriad_executor20150818-152209-1677764800-5050-22280-S2/runs/1d06f1f1-02a9-4413-80de-7393e9e0935e
must be owned by root, but is owned by 700
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/08/19 07:01:10 WARN service.AbstractService: When stopping the
service NodeManager : java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
at
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
15/08/19 07:01:10 FATAL nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
initialize container executor
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
at
org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
Caused by: java.io.IOException: Linux container executor not
configured properly (error=24)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
at
org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
... 3 more
Caused by: ExitCodeException exitCode=24: File
/tmp/mesos/slaves/20150818-152209-1677764800-5050-22280-S2/frameworks/20150818-152209-1677764800-5050-22280-0000/executors/myriad_executor20150818-152209-1677764800-5050-22280-S2/runs/1d06f1f1-02a9-4413-80de-7393e9e0935e
must be owned by root, but is owned by 700
at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
at org.apache.hadoop.util.Shell.run(Shell.java:456)
at
org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
at
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
... 4 more
15/08/19 07:01:10 INFO nodemanager.NodeManager: SHUTDOWN_MSG: