Yuliya, the reason for the chown framework user . is that the the executor (as frameworkUser) must write some files the the MESOS_DIRECTORY, specifically stderr, stdout and at the time the capsule dir (now obsolete). I suppose we could touch these files and then give them the proper permissions.
I was planning to remove a lot of the code once MESOS-1790 is resolved, Jim submitted a patch already. In particular, there would no longer be a frameworkSuperUser (it's there so we can extract the tarball and preserve ownership/permissions for container-executor), and the frameworkUser would just run the yarn nodemanger. If we continue to require the MESOS_DIRECTORY to be owned by root and we'll be required to continue to run it in a way similar to it is currently. I really don't like the idea of running frameworks as root or even with passwordless sudo if I can help it, but at the time it was the only work around. So I guess the question is frameworkSuperUser something that we'd like to eventually depricate or is it here for good? Also, I should comment on Mesos-1790 to see what's going on with the patch. Darin On Sep 8, 2015 7:12 PM, "yuliya Feldman" <[email protected]> wrote: > John, > It is a problem with permissions for container-executor.cfg - it requires > whole path to it to be owned by root. > One step is to change work-dir for mesos-slave to point to a different > directory (not tmp) that is writable only by root. > It still does not solve full issue since binary distro is changing > permissions of the distro directory to a framework user. > If framework user is root and myriad is running as root it can be solved, > otherwise we need changes to binary distro code. > I was planning to do it, but got distracted by other stuff. Will try to > look at it this week. > Thanks,Yuliya > From: John Omernik <[email protected]> > To: [email protected]; yuliya Feldman <[email protected]> > Sent: Tuesday, September 8, 2015 1:31 PM > Subject: Re: Getting Nodes to be "Running" in Mesos > > interesting... when I did root as the framework user then I got this: > > ExitCodeException exitCode=24: File /tmp must not be world or group > writable, but is 1777 > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > at org.apache.hadoop.util.Shell.run(Shell.java:456) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > 15/09/08 15:30:38 INFO nodemanager.ContainerExecutor: > 15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager > failed in state INITED; cause: > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > Caused by: java.io.IOException: Linux container executor not > configured properly (error=24) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > ... 3 more > Caused by: ExitCodeException exitCode=24: File /tmp must not be world > or group writable, but is 1777 > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > at org.apache.hadoop.util.Shell.run(Shell.java:456) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > ... 4 more > 15/09/08 15:30:38 WARN service.AbstractService: When stopping the > service NodeManager : java.lang.NullPointerException > java.lang.NullPointerException > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274) > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > at > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > at > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > 15/09/08 15:30:38 FATAL nodemanager.NodeManager: Error starting NodeManager > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > initialize container executor > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > Caused by: java.io.IOException: Linux container executor not > configured properly (error=24) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > at > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > ... 3 more > Caused by: ExitCodeException exitCode=24: File /tmp must not be world > or group writable, but is 1777 > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > at org.apache.hadoop.util.Shell.run(Shell.java:456) > at > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > at > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > ... 4 more > 15/09/08 15:30:38 INFO nodemanager.NodeManager: SHUTDOWN_MSG: > > > On Tue, Sep 8, 2015 at 3:26 PM, John Omernik <[email protected]> wrote: > > > So some progress: I am getting the error below complaining about > ownership > > of files. In marathon I have user:root on my task, in the myriad > config, I > > have mapr is user 700, so I am unsure on that, I will try with > > framworkUser being root, see if that works? > > > > frameworkUser: mapr # Should be the same user running the resource > manager. > > > > frameworkSuperUser: darkness # Must be root or have passwordless sudo on > > all nodes! > > > > > > > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: File > > > /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8 > > must be owned by root, but is owned by 700 > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > ... 4 more > > 15/09/08 15:23:24 WARN service.AbstractService: When stopping the service > > NodeManager : java.lang.NullPointerException > > java.lang.NullPointerException > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274) > > at > org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221) > > at > > > org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52) > > at > > > org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80) > > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:171) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > 15/09/08 15:23:24 FATAL nodemanager.NodeManager: Error starting > NodeManager > > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to > > initialize container executor > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212) > > at > org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511) > > Caused by: java.io.IOException: Linux container executor not configured > > properly (error=24) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188) > > at > > > org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210) > > ... 3 more > > Caused by: ExitCodeException exitCode=24: File > > > /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8 > > must be owned by root, but is owned by 700 > > > > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545) > > at org.apache.hadoop.util.Shell.run(Shell.java:456) > > at > > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722) > > at > > > org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182) > > ... 4 more > > 15/09/08 15:23:24 INFO nodemanager.NodeManager: SHUTDOWN_MSG: > > /************************************************************ > > SHUTDOWN_MSG: Shutting down NodeManager at > > hadoopmapr2.brewingintel.com/192.168.0.99 > > ************************************************************/ > > > > On Tue, Sep 8, 2015 at 3:23 PM, John Omernik <[email protected]> wrote: > > > >> Also a side note: The Flexing up and now having to have at least one > >> node manager specified at startup: > >> > >> nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero > >> profile. > >> > >> medium: 1 # <profile_name : instances> > >> > >> > >> Is going to lead to task failures with mesos dns because the name won't > >> be ready right away (1 minute delay after kicking off Myriad) do we > NEED to > >> have a non-0 profile nodemanager startup with the resource manager? > >> > >> On Tue, Sep 8, 2015 at 3:16 PM, John Omernik <[email protected]> wrote: > >> > >>> Cool. Question about the yarn-site.xml in general. > >>> > >>> I was struggling with some things in the wiki on this page: > >>> > https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators > >>> > >>> Basically in step 5: > >>> Step 5: Configure YARN to use Myriad > >>> > >>> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as > >>> instructed in Sample: myriad-config-default.yml > >>> < > https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml > > > >>> . > >>> > >>> > >>> (It should not link to the yml, but to the yarn site, side issue) it > has > >>> us put that information in the yarn-site.xml This makes sense. The > >>> resource manager needs to be aware of the myriad stuff. > >>> > >>> Then I go to create a tarbal, (which I SHOULD be able to use for both > >>> resource manager and nodemanager... right?) However, the instructions > state > >>> to remove the *.xml files. > >>> > >>> Step 6: Create the Tarball > >>> > >>> The tarball has all of the files needed for the Node Managers and > >>> Resource Managers. The following shows how to create the tarball and > place > >>> it in HDFS: > >>> cd ~ > >>> sudo cp -rp /opt/hadoop-2.7.0 . > >>> sudo rm hadoop-2.7.0/etc/hadoop/*.xml > >>> sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0 > >>> hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist > >>> > >>> > >>> What I ended up doing... since I am running the resourcemanager > (myriad) > >>> in marathon, is I created two tarballs. One is my > hadoop-2.7.0-RM.tar.gz > >>> which has the all the xml files still in the tar ball for shipping to > >>> marathon. Then other is hadoop-2.7.0-NM.tar.gz which per the > instructions > >>> removes the *.xml files from the /etc/hadoop/ directory. > >>> > >>> > >>> I guess... my logic is that myriad creates the conf directory for the > >>> nodemanagers... but then I thought, and I overthinking something? Am I > >>> missing something? Could that be factoring into what I am doing here? > >>> > >>> > >>> Obviously my first steps are to add the extra yarn-site.xml entries, > but > >>> in this current setup, they are only going into the resource manager > >>> yarn-site as the the node-managers don't have a yarn-site in their > >>> directories. > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Tue, Sep 8, 2015 at 3:09 PM, yuliya Feldman < > >>> [email protected]> wrote: > >>> > >>>> Take a look at : https://github.com/mesos/myriad/pull/128 > >>>> for yarn-site.xml updates > >>>> > >>>> From: John Omernik <[email protected]> > >>>> To: [email protected] > >>>> Sent: Tuesday, September 8, 2015 12:38 PM > >>>> Subject: Getting Nodes to be "Running" in Mesos > >>>> > >>>> So I am playing around with a recent build of Myriad, and I am using > >>>> MapR > >>>> 5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad won't > >>>> run" > >>>> forum, so please forgive me if I am using the list wrong. > >>>> > >>>> Basically, I seem to be able to get myriad running, and the things up, > >>>> and > >>>> it tries to start a nodemanager. > >>>> > >>>> In mesos, the status of the nodemanager task never gets past staging, > >>>> and > >>>> eventually, fails. The logs for both the node manager and myriad, > seem > >>>> to > >>>> look healthy, and I am not sure where I should look next to > troubleshoot > >>>> what is happening. Basically you can see the registration of the > >>>> nodemanager, and then it fails with no error in the logs... Any > thoughts > >>>> would be appreciated on where I can look next for troubleshooting. > >>>> > >>>> > >>>> Node Manager Logs (complete) > >>>> > >>>> STARTUP_MSG: build = [email protected]:mapr/private-hadoop-common.git > >>>> -r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on > >>>> 2015-08-19T20:02Z > >>>> STARTUP_MSG: java = 1.8.0_45-internal > >>>> ************************************************************/ > >>>> 15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX signal > >>>> handlers for [TERM, HUP, INT] > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType > >>>> for class > >>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for > >>>> class org.apache.hadoop.yarn.server.nodemanager.NodeManager > >>>> 15/09/08 14:35:24 INFO impl.MetricsConfig: loaded properties from > >>>> hadoop-metrics2.properties > >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: Scheduled snapshot > >>>> period at 10 second(s). > >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: NodeManager metrics > >>>> system started > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: per > >>>> directory file limit = 8192 > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: > >>>> usercache path : > >>>> file:///tmp/hadoop-mapr/nm-local-dir/usercache_DEL_1441740924753 > >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class > >>>> > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType > >>>> for class > >>>> > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker > >>>> 15/09/08 14:35:24 WARN containermanager.AuxServices: The Auxilurary > >>>> Service named 'mapreduce_shuffle' in the configuration is for class > >>>> org.apache.hadoop.mapred.ShuffleHandler which has a name of > >>>> 'httpshuffle'. Because these are not the same tools trying to send > >>>> ServiceData and read Service Meta Data may have issues unless the > >>>> refer to the name in the config. > >>>> 15/09/08 14:35:24 INFO containermanager.AuxServices: Adding auxiliary > >>>> service httpshuffle, "mapreduce_shuffle" > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Using > >>>> ResourceCalculatorPlugin : > >>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@1a5b6f42 > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Using > >>>> ResourceCalculatorProcessTree : null > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Physical memory > >>>> check enabled: true > >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Virtual memory > >>>> check enabled: false > >>>> 15/09/08 14:35:24 INFO nodemanager.NodeStatusUpdaterImpl: Initialized > >>>> nodemanager for null: physical-memory=16384 virtual-memory=34407 > >>>> virtual-cores=4 disks=4.0 > >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class > >>>> java.util.concurrent.LinkedBlockingQueue > >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for port > >>>> 55449 > >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol > >>>> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: Blocking > >>>> new container-requests as container manager rpc server is still > >>>> starting. > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 55449: > >>>> starting > >>>> 15/09/08 14:35:24 INFO security.NMContainerTokenSecretManager: > >>>> Updating node address : hadoopmapr5.brewingintel.com:55449 > >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class > >>>> java.util.concurrent.LinkedBlockingQueue > >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for port > >>>> 8040 > >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol > >>>> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB > >>>> to the server > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting > >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 8040: > starting > >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: > >>>> Localizer started on port 8040 > >>>> 15/09/08 14:35:24 INFO mapred.IndexCache: IndexCache created with max > >>>> memory = 10485760 > >>>> 15/09/08 14:35:24 INFO mapred.ShuffleHandler: httpshuffle listening on > >>>> port 13562 > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: > >>>> ContainerManager started at hadoopmapr5/192.168.0.96:55449 > >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: > >>>> ContainerManager bound to 0.0.0.0/0.0.0.0:0 > >>>> 15/09/08 14:35:24 INFO webapp.WebServer: Instantiating NMWebApp at > >>>> 0.0.0.0:8042 > >>>> 15/09/08 14:35:24 INFO mortbay.log: Logging to > >>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via > >>>> org.mortbay.log.Slf4jLog > >>>> 15/09/08 14:35:24 INFO http.HttpRequestLog: Http request log for > >>>> http.requests.nodemanager is not defined > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added global filter 'safety' > >>>> (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > >>>> static_user_filter > >>>> > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > >>>> to context node > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > >>>> static_user_filter > >>>> > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > >>>> to context static > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter > >>>> static_user_filter > >>>> > (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) > >>>> to context logs > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /node/* > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /ws/* > >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Jetty bound to port 8042 > >>>> 15/09/08 14:35:24 INFO mortbay.log: jetty-6.1.26 > >>>> 15/09/08 14:35:24 INFO mortbay.log: Extract > >>>> > >>>> > jar:file:/tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S3/frameworks/20150907-111332-1660987584-5050-8033-0003/executors/myriad_executor20150907-111332-1660987584-5050-8033-000320150907-111332-1660987584-5050-8033-O11824820150907-111332-1660987584-5050-8033-S3/runs/67cc8f37-b6d4-4018-a9b4-0071d020c9a5/hadoop-2.7.0/share/hadoop/yarn/hadoop-yarn-common-2.7.0-mapr-1506.jar!/webapps/node > >>>> to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp > >>>> 15/09/08 14:35:25 INFO mortbay.log: Started > >>>> [email protected]:8042 > >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Web app /node started at 8042 > >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Registered webapp guice modules > >>>> 15/09/08 14:35:25 INFO client.RMProxy: Connecting to ResourceManager > >>>> at myriad.marathon.mesos/192.168.0.99:8031 > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Sending out > >>>> 0 NM container statuses: [] > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Registering > >>>> with RM using containers :[] > >>>> 15/09/08 14:35:25 INFO security.NMContainerTokenSecretManager: Rolling > >>>> master-key for container-tokens, got key with id 338249572 > >>>> 15/09/08 14:35:25 INFO security.NMTokenSecretManagerInNM: Rolling > >>>> master-key for container-tokens, got key with id -362725484 > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Registered > >>>> with ResourceManager as hadoopmapr5.brewingintel.com:55449 with total > >>>> resource of <memory:16384, vCores:4, disks:4.0> > >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Notifying > >>>> ContainerManager to unblock new container-requests > >>>> > >>>> > >>>> Except of Myriad logs: > >>>> > >>>> /09/08 14:35:12 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:13 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:15 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:16 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:17 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:18 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:19 INFO handlers.StatusUpdateEventHandler: Status > >>>> Update for task: value: > >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf" > >>>> | state: TASK_FAILED > >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using > >>>> remote distribution > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > >>>> Getting Hadoop distribution > >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > >>>> Getting config from:http://myriad.marathon.mesos:8088/conf > >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: Slave > >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo chown > >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export > >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env > >>>> YARN_HOME="hadoop-2.7.0" > >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0 > >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos > >>>> > >>>> > -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor > >>>> -Dnodemanager.resource.cpu-vcores=4 > >>>> -Dnodemanager.resource.memory-mb=16384 > >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 > >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 > >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 > >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003" $YARN_HOME/bin/yarn > >>>> nodemanager > >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: Launching > >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer: > >>>> value: "20150907-111332-1660987584-5050-8033-O118248" > >>>> > >>>> 15/09/08 14:35:20 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> 15/09/08 14:35:21 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:21 INFO util.AbstractLivelinessMonitor: > >>>> Expired:hadoopmapr5.brewingintel.com:52878 Timed out after 2 secs > >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl: Deactivating Node > >>>> hadoopmapr5.brewingintel.com:52878 as it is now LOST > >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl: > >>>> hadoopmapr5.brewingintel.com:52878 Node Transitioned from RUNNING to > >>>> LOST > >>>> 15/09/08 14:35:21 INFO fair.FairScheduler: Removed node > >>>> hadoopmapr5.brewingintel.com:52878 cluster capacity: <memory:0, > >>>> vCores:0, disks:0.0> > >>>> 15/09/08 14:35:22 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:23 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:25 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:25 INFO util.RackResolver: Resolved > >>>> hadoopmapr5.brewingintel.com to /default-rack > >>>> 15/09/08 14:35:25 INFO resourcemanager.ResourceTrackerService: > >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 55449 > >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4, > >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:55449 > >>>> 15/09/08 14:35:25 INFO rmnode.RMNodeImpl: > >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from NEW to > >>>> RUNNING > >>>> 15/09/08 14:35:25 INFO fair.FairScheduler: Added node > >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:16384, > >>>> vCores:4, disks:4.0> > >>>> 15/09/08 14:35:26 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:27 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:28 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:30 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:31 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:32 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:33 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:35 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:36 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:37 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:38 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:40 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:41 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:42 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:43 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:45 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:46 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:47 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:48 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:50 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:51 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:52 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:53 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:55 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:35:56 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:57 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:35:58 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:00 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:36:01 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:02 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:03 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:05 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:36:06 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:07 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:08 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:10 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:36:11 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:12 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:13 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:15 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 3 > >>>> 15/09/08 14:36:16 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:17 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:18 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:19 INFO handlers.StatusUpdateEventHandler: Status > >>>> Update for task: value: > >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf" > >>>> | state: TASK_FAILED > >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using > >>>> remote distribution > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > >>>> Getting Hadoop distribution > >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: > >>>> Getting config from:http://myriad.marathon.mesos:8088/conf > >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: Slave > >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo chown > >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export > >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env > >>>> YARN_HOME="hadoop-2.7.0" > >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0 > >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos > >>>> > >>>> > -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor > >>>> -Dnodemanager.resource.cpu-vcores=4 > >>>> -Dnodemanager.resource.memory-mb=16384 > >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 > >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 > >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 > >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003" $YARN_HOME/bin/yarn > >>>> nodemanager > >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: Launching > >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer: > >>>> value: "20150907-111332-1660987584-5050-8033-O118392" > >>>> > >>>> 15/09/08 14:36:20 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> 15/09/08 14:36:20 INFO util.AbstractLivelinessMonitor: > >>>> Expired:hadoopmapr5.brewingintel.com:55449 Timed out after 2 secs > >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl: Deactivating Node > >>>> hadoopmapr5.brewingintel.com:55449 as it is now LOST > >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl: > >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from RUNNING to > >>>> LOST > >>>> 15/09/08 14:36:20 INFO fair.FairScheduler: Removed node > >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:0, > >>>> vCores:0, disks:0.0> > >>>> 15/09/08 14:36:22 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> 15/09/08 14:36:23 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:24 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:25 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> 15/09/08 14:36:25 INFO util.RackResolver: Resolved > >>>> hadoopmapr5.brewingintel.com to /default-rack > >>>> 15/09/08 14:36:25 INFO resourcemanager.ResourceTrackerService: > >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 40378 > >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4, > >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:40378 > >>>> 15/09/08 14:36:25 INFO rmnode.RMNodeImpl: > >>>> hadoopmapr5.brewingintel.com:40378 Node Transitioned from NEW to > >>>> RUNNING > >>>> 15/09/08 14:36:25 INFO fair.FairScheduler: Added node > >>>> hadoopmapr5.brewingintel.com:40378 cluster capacity: <memory:16384, > >>>> vCores:4, disks:4.0> > >>>> 15/09/08 14:36:27 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> 15/09/08 14:36:28 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:29 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 1 > >>>> 15/09/08 14:36:30 INFO handlers.ResourceOffersEventHandler: Received > >>>> offers 2 > >>>> > >>>> > >>>> > >>> > >>> > >> > > > >
