So focusing on this issue to run Myriad at this point, we would need to 1. Run Myriad as root (i.e. in marathon "user":"root", must be added to the json so it runs as root) 2. Have the frameworkUser be root 3. Have the frameoworkSuperUser either be root or be someone who can passwordlessly sudo to root. 4. Have the entire path of the slave work-dir be owned by root and only writable by root up to where the container-executor.cfg exists.
On point 4, so for me I am running my slaves pointing to a work directory that is /opt/mapr/mesos/tmp/slave in that I have some space issues on some of my nodes /. Even if I ran it to /tmp I would run into the same problem. If I found a new place to put the work directly on every slave, where it was root writable from / to the .cfg file, then it would work. But, would other frameworks fail? Or would their chown process actually fix things so they could write? This seems like a huge work around to get Myriad running. At this point is there another way to get Myriad or is running all as root the only way? Just trying to get myriad back up and running here. On Tue, Sep 8, 2015 at 9:30 PM, Darin Johnson <[email protected]> wrote: > Yuliya, the reason for the chown framework user . is that the the executor > (as frameworkUser) must write some files the the MESOS_DIRECTORY, > specifically stderr, stdout and at the time the capsule dir (now > obsolete). I suppose we could touch these files and then give them the > proper permissions. > > I was planning to remove a lot of the code once MESOS-1790 is resolved, Jim > submitted a patch already. In particular, there would no longer be a > frameworkSuperUser (it's there so we can extract the tarball and preserve > ownership/permissions for container-executor), and the frameworkUser would > just run the yarn nodemanger. If we continue to require the > MESOS_DIRECTORY to be owned by root and we'll be required to continue to > run it in a way similar to it is currently. I really don't like the idea > of running frameworks as root or even with passwordless sudo if I can help > it, but at the time it was the only work around. > > So I guess the question is frameworkSuperUser something that we'd like to > eventually depricate or is it here for good? Also, I should comment on > Mesos-1790 to see what's going on with the patch. > > Darin > > > > On Sep 8, 2015 7:12 PM, "yuliya Feldman" <[email protected]> > wrote: > > > John, > > It is a problem with permissions for container-executor.cfg - it requires > > whole path to it to be owned by root. > > One step is to change work-dir for mesos-slave to point to a different > > directory (not tmp) that is writable only by root. > > It still does not solve full issue since binary distro is changing > > permissions of the distro directory to a framework user. > > If framework user is root and myriad is running as root it can be solved, > > otherwise we need changes to binary distro code. > > I was planning to do it, but got distracted by other stuff. Will try to > > look at it this week. > > Thanks,Yuliya > > From: John Omernik <[email protected]> > > To: [email protected]; yuliya Feldman < > [email protected]> > > Sent: Tuesday, September 8, 2015 1:31 PM > > Subject: Re: Getting Nodes to be "Running" in Mesos > > > > interesting... when I did root as the framework user then I got this: > > > > ExitCodeException exitCode=24: File /tmp must not be world or group > > writable, but is 1777 > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > 15/09/08 15:30:38 INFO nodemanager.ContainerExecutor: > > 15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager > > failed in state INITED; cause: > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > Caused by: java.io.IOException: Linux container executor not > > configured properly (error=24) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: File /tmp must not be world > > or group writable, but is 1777 > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > ... 4 more > > 15/09/08 15:30:38 WARN service.AbstractService: When stopping the > > service NodeManager : java.lang.NullPointerException > > java.lang.NullPointerException > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274) > > at > > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > at > > > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > > at > > > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > 15/09/08 15:30:38 FATAL nodemanager.NodeManager: Error starting > NodeManager > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > Caused by: java.io.IOException: Linux container executor not > > configured properly (error=24) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: File /tmp must not be world > > or group writable, but is 1777 > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > ... 4 more > > 15/09/08 15:30:38 INFO nodemanager.NodeManager: SHUTDOWN_MSG: > > > > > > On Tue, Sep 8, 2015 at 3:26 PM, John Omernik <[email protected]> wrote: > > > > > So some progress: I am getting the error below complaining about > > ownership > > > of files. In marathon I have user:root on my task, in the myriad > > config, I > > > have mapr is user 700, so I am unsure on that, I will try with > > > framworkUser being root, see if that works? > > > > > > frameworkUser: mapr # Should be the same user running the resource > > manager. > > > > > > frameworkSuperUser: darkness # Must be root or have passwordless sudo > on > > > all nodes! > > > > > > > > > > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > > ... 3 more > > > Caused by: ExitCodeException exitCode=24: File > > > > > > /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8 > > > must be owned by root, but is owned by 700 > > > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > > at > > > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > > ... 4 more > > > 15/09/08 15:23:24 WARN service.AbstractService: When stopping the > service > > > NodeManager : java.lang.NullPointerException > > > java.lang.NullPointerException > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274) > > > at > > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > > at > > > > > > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > > > at > > > > > > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > > 15/09/08 15:23:24 FATAL nodemanager.NodeManager: Error starting > > NodeManager > > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > > initialize container executor > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > > at > > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > > Caused by: java.io.IOException: Linux container executor not configured > > > properly (error=24) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > > ... 3 more > > > Caused by: ExitCodeException exitCode=24: File > > > > > > /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8 > > > must be owned by root, but is owned by 700 > > > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > > at > > > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > > at > > > > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > > ... 4 more > > > 15/09/08 15:23:24 INFO nodemanager.NodeManager: SHUTDOWN_MSG: > > > /************************************************************ > > > SHUTDOWN_MSG: Shutting down NodeManager at > > > hadoopmapr2.brewingintel.com/192.168.0.99 > > > ************************************************************/ > > > > > > On Tue, Sep 8, 2015 at 3:23 PM, John Omernik <[email protected]> wrote: > > > > > >> Also a side note: The Flexing up and now having to have at least one > > >> node manager specified at startup: > > >> > > >> nmInstances: # NMs to start with. Requires at least 1 NM with a > non-zero > > >> profile. > > >> > > >> medium: 1 # <profile_name : instances> > > >> > > >> > > >> Is going to lead to task failures with mesos dns because the name > won't > > >> be ready right away (1 minute delay after kicking off Myriad) do we > > NEED to > > >> have a non-0 profile nodemanager startup with the resource manager? > > >> > > >> On Tue, Sep 8, 2015 at 3:16 PM, John Omernik <[email protected]> > wrote: > > >> > > >>> Cool. Question about the yarn-site.xml in general. > > >>> > > >>> I was struggling with some things in the wiki on this page: > > >>> > > > https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators > > >>> > > >>> Basically in step 5: > > >>> Step 5: Configure YARN to use Myriad > > >>> > > >>> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as > > >>> instructed in Sample: myriad-config-default.yml > > >>> < > > > https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml > > > > > >>> . > > >>> > > >>> > > >>> (It should not link to the yml, but to the yarn site, side issue) it > > has > > >>> us put that information in the yarn-site.xml This makes sense. The > > >>> resource manager needs to be aware of the myriad stuff. > > >>> > > >>> Then I go to create a tarbal, (which I SHOULD be able to use for both > > >>> resource manager and nodemanager... right?) However, the instructions > > state > > >>> to remove the *.xml files. > > >>> > > >>> Step 6: Create the Tarball > > >>> > > >>> The tarball has all of the files needed for the Node Managers and > > >>> Resource Managers. The following shows how to create the tarball and > > place > > >>> it in HDFS: > > >>> cd ~ > > >>> sudo cp -rp /opt/hadoop-2.7.0 . > > >>> sudo rm hadoop-2.7.0/etc/hadoop/*.xml > > >>> sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0 > > >>> hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist > > >>> > > >>> > > >>> What I ended up doing... since I am running the resourcemanager > > (myriad) > > >>> in marathon, is I created two tarballs. One is my > > hadoop-2.7.0-RM.tar.gz > > >>> which has the all the xml files still in the tar ball for shipping to > > >>> marathon. Then other is hadoop-2.7.0-NM.tar.gz which per the > > instructions > > >>> removes the *.xml files from the /etc/hadoop/ directory. > > >>> > > >>> > > >>> I guess... my logic is that myriad creates the conf directory for the > > >>> nodemanagers... but then I thought, and I overthinking something? Am > I > > >>> missing something? Could that be factoring into what I am doing here? > > >>> > > >>> > > >>> Obviously my first steps are to add the extra yarn-site.xml entries, > > but > > >>> in this current setup, they are only going into the resource manager > > >>> yarn-site as the the node-managers don't have a yarn-site in their > > >>> directories. > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> On Tue, Sep 8, 2015 at 3:09 PM, yuliya Feldman < > > >>> [email protected]> wrote: > > >>> > > >>>> Take a look at : https://github.com/mesos/myriad/pull/128 > > >>>> for yarn-site.xml updates > > >>>> > > >>>> From: John Omernik <[email protected]> > > >>>> To: [email protected] > > >>>> Sent: Tuesday, September 8, 2015 12:38 PM > > >>>> Subject: Getting Nodes to be "Running" in Mesos > > >>>> > > >>>> So I am playing around with a recent build of Myriad, and I am using > > >>>> MapR > > >>>> 5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad > won't > > >>>> run" > > >>>> forum, so please forgive me if I am using the list wrong. > > >>>> > > >>>> Basically, I seem to be able to get myriad running, and the things > up, > > >>>> and > > >>>> it tries to start a nodemanager. > > >>>> > > >>>> In mesos, the status of the nodemanager task never gets past > staging, > > >>>> and > > >>>> eventually, fails. The logs for both the node manager and myriad, > > seem > > >>>> to > > >>>> look healthy, and I am not sure where I should look next to > > troubleshoot > > >>>> what is happening. Basically you can see the registration of the > > >>>> nodemanager, and then it fails with no error in the logs... Any > > thoughts > > >>>> would be appreciated on where I can look next for troubleshooting. > > >>>> > > >>>> > > >>>> Node Manager Logs (complete) > > >>>> > > >>>> STARTUP_MSG: build = [email protected]:mapr/private-hadoop-common.git > > >>>> -r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on > > >>>> 2015-08-19T20:02Z > > >>>> STARTUP_MSG: java = 1.8.0_45-internal > > >>>> ************************************************************/ > > >>>> 15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX > signal > > >>>> handlers for [TERM, HUP, INT] > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType > > >>>> for class > > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for > > >>>> class org.apache.hadoop.yarn.server.nodemanager.NodeManager > > >>>> 15/09/08 14:35:24 INFO impl.MetricsConfig: loaded properties from > > >>>> hadoop-metrics2.properties > > >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: Scheduled snapshot > > >>>> period at 10 second(s). > > >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: NodeManager metrics > > >>>> system started > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService > > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: per > > >>>> directory file limit = 8192 > > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: > > >>>> usercache path : > > >>>> file:///tmp/hadoop-mapr/nm-local-dir/usercache_DEL_1441740924753 > > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > > >>>> > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType > > >>>> for class > > >>>> > > > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker > > >>>> 15/09/08 14:35:24 WARN containermanager.AuxServices: The Auxilurary > > >>>> Service named 'mapreduce_shuffle' in the configuration is for class > > >>>> org.apache.hadoop.mapred.ShuffleHandler which has a name of > > >>>> 'httpshuffle'. Because these are not the same tools trying to send > > >>>> ServiceData and read Service Meta Data may have issues unless the > > >>>> refer to the name in the config. > > >>>> 15/09/08 14:35:24 INFO containermanager.AuxServices: Adding > auxiliary > > >>>> service httpshuffle, "mapreduce_shuffle" > > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Using > > >>>> ResourceCalculatorPlugin : > > >>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@1a5b6f42 > > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Using > > >>>> ResourceCalculatorProcessTree : null > > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Physical > memory > > >>>> check enabled: true > > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Virtual memory > > >>>> check enabled: false > > >>>> 15/09/08 14:35:24 INFO nodemanager.NodeStatusUpdaterImpl: > Initialized > > >>>> nodemanager for null: physical-memory=16384 virtual-memory=34407 > > >>>> virtual-cores=4 disks=4.0 > > >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class > > >>>> java.util.concurrent.LinkedBlockingQueue > > >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for > port > > >>>> 55449 > > >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol > > >>>> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the > server > > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: > Blocking > > >>>> new container-requests as container manager rpc server is still > > >>>> starting. > > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting > > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 55449: > > >>>> starting > > >>>> 15/09/08 14:35:24 INFO security.NMContainerTokenSecretManager: > > >>>> Updating node address : hadoopmapr5.brewingintel.com:55449 > > >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class > > >>>> java.util.concurrent.LinkedBlockingQueue > > >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for > port > > >>>> 8040 > > >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol > > >>>> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > > >>>> to the server > > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting > > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 8040: > > starting > > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: > > >>>> Localizer started on port 8040 > > >>>> 15/09/08 14:35:24 INFO mapred.IndexCache: IndexCache created with > max > > >>>> memory = 10485760 > > >>>> 15/09/08 14:35:24 INFO mapred.ShuffleHandler: httpshuffle listening > on > > >>>> port 13562 > > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: > > >>>> ContainerManager started at hadoopmapr5/192.168.0.96:55449 > > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: > > >>>> ContainerManager bound to 0.0.0.0/0.0.0.0:0 > > >>>> 15/09/08 14:35:24 INFO webapp.WebServer: Instantiating NMWebApp at > > >>>> 0.0.0.0:8042 > > >>>> 15/09/08 14:35:24 INFO mortbay.log: Logging to > > >>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > > >>>> org.mortbay.log.Slf4jLog > > >>>> 15/09/08 14:35:24 INFO http.HttpRequestLog: Http request log for > > >>>> http.requests.nodemanager is not defined > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added global filter > 'safety' > > >>>> (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > > >>>> static_user_filter > > >>>> > > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > > >>>> to context node > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > > >>>> static_user_filter > > >>>> > > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > > >>>> to context static > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > > >>>> static_user_filter > > >>>> > > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > > >>>> to context logs > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /node/* > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /ws/* > > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Jetty bound to port 8042 > > >>>> 15/09/08 14:35:24 INFO mortbay.log: jetty-6.1.26 > > >>>> 15/09/08 14:35:24 INFO mortbay.log: Extract > > >>>> > > >>>> > > > jar:file:/tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S3/frameworks/20150907-111332-1660987584-5050-8033-0003/executors/myriad_executor20150907-111332-1660987584-5050-8033-000320150907-111332-1660987584-5050-8033-O11824820150907-111332-1660987584-5050-8033-S3/runs/67cc8f37-b6d4-4018-a9b4-0071d020c9a5/hadoop-2.7.0/share/hadoop/yarn/hadoop-yarn-common-2.7.0-mapr-1506.jar!/webapps/node > > >>>> to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp > > >>>> 15/09/08 14:35:25 INFO mortbay.log: Started > > >>>> [email protected]:8042 > > >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Web app /node started at 8042 > > >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Registered webapp guice > modules > > >>>> 15/09/08 14:35:25 INFO client.RMProxy: Connecting to ResourceManager > > >>>> at myriad.marathon.mesos/192.168.0.99:8031 > > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Sending > out > > >>>> 0 NM container statuses: [] > > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: > Registering > > >>>> with RM using containers :[] > > >>>> 15/09/08 14:35:25 INFO security.NMContainerTokenSecretManager: > Rolling > > >>>> master-key for container-tokens, got key with id 338249572 > > >>>> 15/09/08 14:35:25 INFO security.NMTokenSecretManagerInNM: Rolling > > >>>> master-key for container-tokens, got key with id -362725484 > > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Registered > > >>>> with ResourceManager as hadoopmapr5.brewingintel.com:55449 with > total > > >>>> resource of <memory:16384, vCores:4, disks:4.0> > > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Notifying > > >>>> ContainerManager to unblock new container-requests > > >>>> > > >>>> > > >>>> Except of Myriad logs: > > >>>> > > >>>> /09/08 14:35:12 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:13 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:15 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:16 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:17 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:18 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:19 INFO handlers.StatusUpdateEventHandler: Status > > >>>> Update for task: value: > > >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf" > > >>>> | state: TASK_FAILED > > >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using > > >>>> remote distribution > > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > > >>>> Getting Hadoop distribution > > >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz > > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > > >>>> Getting config from:http://myriad.marathon.mesos:8088/conf > > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > Slave > > >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo > chown > > >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export > > >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env > > >>>> YARN_HOME="hadoop-2.7.0" > > >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0 > > >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos > > >>>> > > >>>> > > > -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor > > >>>> -Dnodemanager.resource.cpu-vcores=4 > > >>>> -Dnodemanager.resource.memory-mb=16384 > > >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 > > >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 > > >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 > > >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003" $YARN_HOME/bin/yarn > > >>>> nodemanager > > >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: > Launching > > >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer: > > >>>> value: "20150907-111332-1660987584-5050-8033-O118248" > > >>>> > > >>>> 15/09/08 14:35:20 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> 15/09/08 14:35:21 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:21 INFO util.AbstractLivelinessMonitor: > > >>>> Expired:hadoopmapr5.brewingintel.com:52878 Timed out after 2 secs > > >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl: Deactivating Node > > >>>> hadoopmapr5.brewingintel.com:52878 as it is now LOST > > >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl: > > >>>> hadoopmapr5.brewingintel.com:52878 Node Transitioned from RUNNING > to > > >>>> LOST > > >>>> 15/09/08 14:35:21 INFO fair.FairScheduler: Removed node > > >>>> hadoopmapr5.brewingintel.com:52878 cluster capacity: <memory:0, > > >>>> vCores:0, disks:0.0> > > >>>> 15/09/08 14:35:22 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:23 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:25 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:25 INFO util.RackResolver: Resolved > > >>>> hadoopmapr5.brewingintel.com to /default-rack > > >>>> 15/09/08 14:35:25 INFO resourcemanager.ResourceTrackerService: > > >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 55449 > > >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4, > > >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:55449 > > >>>> 15/09/08 14:35:25 INFO rmnode.RMNodeImpl: > > >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from NEW to > > >>>> RUNNING > > >>>> 15/09/08 14:35:25 INFO fair.FairScheduler: Added node > > >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:16384, > > >>>> vCores:4, disks:4.0> > > >>>> 15/09/08 14:35:26 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:27 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:28 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:30 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:31 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:32 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:33 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:35 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:36 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:37 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:38 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:40 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:41 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:42 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:43 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:45 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:46 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:47 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:48 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:50 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:51 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:52 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:53 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:55 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:35:56 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:57 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:35:58 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:00 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:36:01 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:02 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:03 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:05 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:36:06 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:07 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:08 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:10 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:36:11 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:12 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:13 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:15 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 3 > > >>>> 15/09/08 14:36:16 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:17 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:18 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:19 INFO handlers.StatusUpdateEventHandler: Status > > >>>> Update for task: value: > > >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf" > > >>>> | state: TASK_FAILED > > >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using > > >>>> remote distribution > > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > > >>>> Getting Hadoop distribution > > >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz > > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > > >>>> Getting config from:http://myriad.marathon.mesos:8088/conf > > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > Slave > > >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo > chown > > >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export > > >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env > > >>>> YARN_HOME="hadoop-2.7.0" > > >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0 > > >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos > > >>>> > > >>>> > > > -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor > > >>>> -Dnodemanager.resource.cpu-vcores=4 > > >>>> -Dnodemanager.resource.memory-mb=16384 > > >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 > > >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 > > >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 > > >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003" $YARN_HOME/bin/yarn > > >>>> nodemanager > > >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: > Launching > > >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer: > > >>>> value: "20150907-111332-1660987584-5050-8033-O118392" > > >>>> > > >>>> 15/09/08 14:36:20 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> 15/09/08 14:36:20 INFO util.AbstractLivelinessMonitor: > > >>>> Expired:hadoopmapr5.brewingintel.com:55449 Timed out after 2 secs > > >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl: Deactivating Node > > >>>> hadoopmapr5.brewingintel.com:55449 as it is now LOST > > >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl: > > >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from RUNNING > to > > >>>> LOST > > >>>> 15/09/08 14:36:20 INFO fair.FairScheduler: Removed node > > >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:0, > > >>>> vCores:0, disks:0.0> > > >>>> 15/09/08 14:36:22 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> 15/09/08 14:36:23 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:24 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:25 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> 15/09/08 14:36:25 INFO util.RackResolver: Resolved > > >>>> hadoopmapr5.brewingintel.com to /default-rack > > >>>> 15/09/08 14:36:25 INFO resourcemanager.ResourceTrackerService: > > >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 40378 > > >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4, > > >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:40378 > > >>>> 15/09/08 14:36:25 INFO rmnode.RMNodeImpl: > > >>>> hadoopmapr5.brewingintel.com:40378 Node Transitioned from NEW to > > >>>> RUNNING > > >>>> 15/09/08 14:36:25 INFO fair.FairScheduler: Added node > > >>>> hadoopmapr5.brewingintel.com:40378 cluster capacity: <memory:16384, > > >>>> vCores:4, disks:4.0> > > >>>> 15/09/08 14:36:27 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> 15/09/08 14:36:28 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:29 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 1 > > >>>> 15/09/08 14:36:30 INFO handlers.ResourceOffersEventHandler: Received > > >>>> offers 2 > > >>>> > > >>>> > > >>>> > > >>> > > >>> > > >> > > > > > > > >
