Re: Getting Nodes to be "Running" in Mesos

Darin Johnson Tue, 08 Sep 2015 19:30:55 -0700

Yuliya, the reason for the chown framework user . is that the the executor
(as frameworkUser) must write some files the the MESOS_DIRECTORY,
specifically stderr, stdout and at the time the capsule dir (now
obsolete).  I suppose we could touch these files and then give them the
proper permissions.


I was planning to remove a lot of the code once MESOS-1790 is resolved, Jim
submitted a patch already.  In particular, there would no longer be a
frameworkSuperUser (it's there so we can extract the tarball and preserve
ownership/permissions for container-executor), and the frameworkUser would
just run the yarn nodemanger.  If we continue to require the
MESOS_DIRECTORY to be owned by root and we'll be required to continue to
run it in a way similar to it is currently.  I really don't like the idea
of running frameworks as root or even with passwordless sudo if I can help
it, but at the time it was the only work around.

So I guess the question is frameworkSuperUser something that we'd like to
eventually depricate or is it here for good?  Also, I should comment on
Mesos-1790 to see what's going on with the patch.

Darin



On Sep 8, 2015 7:12 PM, "yuliya Feldman" <[email protected]>
wrote:

> John,
> It is a problem with permissions for container-executor.cfg - it requires
> whole path to it to be owned by root.
> One step is to change work-dir for mesos-slave to point to a different
> directory (not tmp) that is writable only by root.
> It still does not solve full issue since binary distro is changing
> permissions of the distro directory to a framework user.
> If framework user is root and myriad is running as root it can be solved,
> otherwise we need changes to binary distro code.
> I was planning to do it, but got distracted by other stuff. Will try to
> look at it this week.
> Thanks,Yuliya
>       From: John Omernik <[email protected]>
>  To: [email protected]; yuliya Feldman <[email protected]>
>  Sent: Tuesday, September 8, 2015 1:31 PM
>  Subject: Re: Getting Nodes to be "Running" in Mesos
>
> interesting... when I did root as the framework user then I got this:
>
> ExitCodeException exitCode=24: File /tmp must not be world or group
> writable, but is 1777
>
>     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>     at org.apache.hadoop.util.Shell.run(Shell.java:456)
>     at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>     at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
>     at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> 15/09/08 15:30:38 INFO nodemanager.ContainerExecutor:
> 15/09/08 15:30:38 INFO service.AbstractService: Service NodeManager
> failed in state INITED; cause:
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
>     at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> Caused by: java.io.IOException: Linux container executor not
> configured properly (error=24)
>     at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
>     ... 3 more
> Caused by: ExitCodeException exitCode=24: File /tmp must not be world
> or group writable, but is 1777
>
>     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>     at org.apache.hadoop.util.Shell.run(Shell.java:456)
>     at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>     at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
>     ... 4 more
> 15/09/08 15:30:38 WARN service.AbstractService: When stopping the
> service NodeManager : java.lang.NullPointerException
> java.lang.NullPointerException
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
>     at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
>     at
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
>     at
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
>     at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> 15/09/08 15:30:38 FATAL nodemanager.NodeManager: Error starting NodeManager
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> initialize container executor
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
>     at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> Caused by: java.io.IOException: Linux container executor not
> configured properly (error=24)
>     at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
>     at
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
>     ... 3 more
> Caused by: ExitCodeException exitCode=24: File /tmp must not be world
> or group writable, but is 1777
>
>     at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
>     at org.apache.hadoop.util.Shell.run(Shell.java:456)
>     at
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
>     at
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
>     ... 4 more
> 15/09/08 15:30:38 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
>
>
> On Tue, Sep 8, 2015 at 3:26 PM, John Omernik <[email protected]> wrote:
>
> > So some progress: I am getting the error below complaining about
> ownership
> > of files.  In marathon I have user:root on my task, in the myriad
> config, I
> > have  mapr is user 700, so I am unsure on that, I will try with
> > framworkUser being root, see if that works?
> >
> > frameworkUser: mapr # Should be the same user running the resource
> manager.
> >
> > frameworkSuperUser: darkness # Must be root or have passwordless sudo on
> > all nodes!
> >
> >
> >
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
> > ... 3 more
> > Caused by: ExitCodeException exitCode=24: File
> >
> /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8
> > must be owned by root, but is owned by 700
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> > at org.apache.hadoop.util.Shell.run(Shell.java:456)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
> > ... 4 more
> > 15/09/08 15:23:24 WARN service.AbstractService: When stopping the service
> > NodeManager : java.lang.NullPointerException
> > java.lang.NullPointerException
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.stopRecoveryStore(NodeManager.java:162)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStop(NodeManager.java:274)
> > at
> org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
> > at
> >
> org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
> > at
> >
> org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
> > at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> > 15/09/08 15:23:24 FATAL nodemanager.NodeManager: Error starting
> NodeManager
> > org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to
> > initialize container executor
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:212)
> > at
> org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:463)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:511)
> > Caused by: java.io.IOException: Linux container executor not configured
> > properly (error=24)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:188)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:210)
> > ... 3 more
> > Caused by: ExitCodeException exitCode=24: File
> >
> /tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S1/frameworks/20150907-111332-1660987584-5050-8033-0005/executors/myriad_executor20150907-111332-1660987584-5050-8033-000520150907-111332-1660987584-5050-8033-O12269720150907-111332-1660987584-5050-8033-S1/runs/8c48f443-f768-45b1-8cb2-55ff5b5a99d8
> > must be owned by root, but is owned by 700
> >
> > at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)
> > at org.apache.hadoop.util.Shell.run(Shell.java:456)
> > at
> > org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:722)
> > at
> >
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:182)
> > ... 4 more
> > 15/09/08 15:23:24 INFO nodemanager.NodeManager: SHUTDOWN_MSG:
> > /************************************************************
> > SHUTDOWN_MSG: Shutting down NodeManager at
> > hadoopmapr2.brewingintel.com/192.168.0.99
> > ************************************************************/
> >
> > On Tue, Sep 8, 2015 at 3:23 PM, John Omernik <[email protected]> wrote:
> >
> >> Also a side note:  The Flexing up and now having to have at least one
> >> node manager specified at startup:
> >>
> >> nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
> >> profile.
> >>
> >>  medium: 1 # <profile_name : instances>
> >>
> >>
> >> Is going to lead to task failures with mesos dns because the name won't
> >> be ready right away (1 minute delay after kicking off Myriad) do we
> NEED to
> >> have a non-0 profile nodemanager startup with the resource manager?
> >>
> >> On Tue, Sep 8, 2015 at 3:16 PM, John Omernik <[email protected]> wrote:
> >>
> >>> Cool.  Question about the yarn-site.xml in general.
> >>>
> >>> I was struggling with some things in the wiki on this page:
> >>>
> https://cwiki.apache.org/confluence/display/MYRIAD/Installing+for+Administrators
> >>>
> >>> Basically in step 5:
> >>> Step 5: Configure YARN to use Myriad
> >>>
> >>> Modify the */opt/hadoop-2.7.0/etc/hadoop/yarn-site.xml* file as
> >>> instructed in Sample: myriad-config-default.yml
> >>> <
> https://cwiki.apache.org/confluence/display/MYRIAD/Sample%3A+myriad-config-default.yml
> >
> >>> .
> >>>
> >>>
> >>> (It should not link to the yml, but to the yarn site, side issue) it
> has
> >>> us put that information in the yarn-site.xml This makes sense.  The
> >>> resource manager needs to be aware of the myriad stuff.
> >>>
> >>> Then I go to create a tarbal, (which I SHOULD be able to use for both
> >>> resource manager and nodemanager... right?) However, the instructions
> state
> >>> to remove the *.xml files.
> >>>
> >>> Step 6: Create the Tarball
> >>>
> >>> The tarball has all of the files needed for the Node Managers and
> >>> Resource Managers. The following shows how to create the tarball and
> place
> >>> it in HDFS:
> >>> cd ~
> >>> sudo cp -rp /opt/hadoop-2.7.0 .
> >>> sudo rm hadoop-2.7.0/etc/hadoop/*.xml
> >>> sudo tar -zcpf ~/hadoop-2.7.0.tar.gz hadoop-2.7.0
> >>> hadoop fs -put ~/hadoop-2.7.0.tar.gz /dist
> >>>
> >>>
> >>> What I ended up doing... since I am running the resourcemanager
> (myriad)
> >>> in marathon, is I created two tarballs. One is my
> hadoop-2.7.0-RM.tar.gz
> >>> which has the all the xml files still in the tar ball for shipping to
> >>> marathon. Then other is hadoop-2.7.0-NM.tar.gz which per the
> instructions
> >>> removes the *.xml files from the /etc/hadoop/ directory.
> >>>
> >>>
> >>> I guess... my logic is that myriad creates the conf directory for the
> >>> nodemanagers... but then I thought, and I overthinking something? Am I
> >>> missing something? Could that be factoring into what I am doing here?
> >>>
> >>>
> >>> Obviously my first steps are to add the extra yarn-site.xml entries,
> but
> >>> in this current setup, they are only going into the resource manager
> >>> yarn-site as the the node-managers don't have a yarn-site in their
> >>> directories.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Tue, Sep 8, 2015 at 3:09 PM, yuliya Feldman <
> >>> [email protected]> wrote:
> >>>
> >>>> Take a look at :  https://github.com/mesos/myriad/pull/128
> >>>> for yarn-site.xml updates
> >>>>
> >>>>      From: John Omernik <[email protected]>
> >>>>  To: [email protected]
> >>>>  Sent: Tuesday, September 8, 2015 12:38 PM
> >>>>  Subject: Getting Nodes to be "Running" in Mesos
> >>>>
> >>>> So I am playing around with a recent build of Myriad, and I am using
> >>>> MapR
> >>>> 5.0 (hadoop-2.7.0) I hate to use the dev list as a "help Myriad won't
> >>>> run"
> >>>> forum, so please forgive me if I am using the list wrong.
> >>>>
> >>>> Basically, I seem to be able to get myriad running, and the things up,
> >>>> and
> >>>> it tries to start a nodemanager.
> >>>>
> >>>> In mesos, the status of the nodemanager task never gets past staging,
> >>>> and
> >>>> eventually, fails.  The logs for both the node manager and myriad,
> seem
> >>>> to
> >>>> look healthy, and I am not sure where I should look next to
> troubleshoot
> >>>> what is happening. Basically you can see the registration of the
> >>>> nodemanager, and then it fails with no error in the logs... Any
> thoughts
> >>>> would be appreciated on where I can look next for troubleshooting.
> >>>>
> >>>>
> >>>> Node Manager Logs (complete)
> >>>>
> >>>> STARTUP_MSG:  build = [email protected]:mapr/private-hadoop-common.git
> >>>> -r fc95119f587541fb3a9af0dbeeed23c974178115; compiled by 'root' on
> >>>> 2015-08-19T20:02Z
> >>>> STARTUP_MSG:  java = 1.8.0_45-internal
> >>>> ************************************************************/
> >>>> 15/09/08 14:35:23 INFO nodemanager.NodeManager: registered UNIX signal
> >>>> handlers for [TERM, HUP, INT]
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.container.ContainerEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ContainerEventDispatcher
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.application.ApplicationEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl$ApplicationEventDispatcher
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizationEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServicesEventType
> >>>> for class
> >>>> org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncherEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainersLauncher
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>> org.apache.hadoop.yarn.server.nodemanager.ContainerManagerEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>> org.apache.hadoop.yarn.server.nodemanager.NodeManagerEventType for
> >>>> class org.apache.hadoop.yarn.server.nodemanager.NodeManager
> >>>> 15/09/08 14:35:24 INFO impl.MetricsConfig: loaded properties from
> >>>> hadoop-metrics2.properties
> >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: Scheduled snapshot
> >>>> period at 10 second(s).
> >>>> 15/09/08 14:35:24 INFO impl.MetricsSystemImpl: NodeManager metrics
> >>>> system started
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.event.LogHandlerEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.sharedcache.SharedCacheUploadService
> >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService: per
> >>>> directory file limit = 8192
> >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService:
> >>>> usercache path :
> >>>> file:///tmp/hadoop-mapr/nm-local-dir/usercache_DEL_1441740924753
> >>>> 15/09/08 14:35:24 INFO event.AsyncDispatcher: Registering class
> >>>>
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.event.LocalizerEventType
> >>>> for class
> >>>>
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker
> >>>> 15/09/08 14:35:24 WARN containermanager.AuxServices: The Auxilurary
> >>>> Service named 'mapreduce_shuffle' in the configuration is for class
> >>>> org.apache.hadoop.mapred.ShuffleHandler which has a name of
> >>>> 'httpshuffle'. Because these are not the same tools trying to send
> >>>> ServiceData and read Service Meta Data may have issues unless the
> >>>> refer to the name in the config.
> >>>> 15/09/08 14:35:24 INFO containermanager.AuxServices: Adding auxiliary
> >>>> service httpshuffle, "mapreduce_shuffle"
> >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl:  Using
> >>>> ResourceCalculatorPlugin :
> >>>> org.apache.hadoop.yarn.util.LinuxResourceCalculatorPlugin@1a5b6f42
> >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl:  Using
> >>>> ResourceCalculatorProcessTree : null
> >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Physical memory
> >>>> check enabled: true
> >>>> 15/09/08 14:35:24 INFO monitor.ContainersMonitorImpl: Virtual memory
> >>>> check enabled: false
> >>>> 15/09/08 14:35:24 INFO nodemanager.NodeStatusUpdaterImpl: Initialized
> >>>> nodemanager for null: physical-memory=16384 virtual-memory=34407
> >>>> virtual-cores=4 disks=4.0
> >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class
> >>>> java.util.concurrent.LinkedBlockingQueue
> >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for port
> >>>> 55449
> >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol
> >>>> org.apache.hadoop.yarn.api.ContainerManagementProtocolPB to the server
> >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl: Blocking
> >>>> new container-requests as container manager rpc server is still
> >>>> starting.
> >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting
> >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 55449:
> >>>> starting
> >>>> 15/09/08 14:35:24 INFO security.NMContainerTokenSecretManager:
> >>>> Updating node address : hadoopmapr5.brewingintel.com:55449
> >>>> 15/09/08 14:35:24 INFO ipc.CallQueueManager: Using callQueue class
> >>>> java.util.concurrent.LinkedBlockingQueue
> >>>> 15/09/08 14:35:24 INFO ipc.Server: Starting Socket Reader #1 for port
> >>>> 8040
> >>>> 15/09/08 14:35:24 INFO pb.RpcServerFactoryPBImpl: Adding protocol
> >>>> org.apache.hadoop.yarn.server.nodemanager.api.LocalizationProtocolPB
> >>>> to the server
> >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server Responder: starting
> >>>> 15/09/08 14:35:24 INFO ipc.Server: IPC Server listener on 8040:
> starting
> >>>> 15/09/08 14:35:24 INFO localizer.ResourceLocalizationService:
> >>>> Localizer started on port 8040
> >>>> 15/09/08 14:35:24 INFO mapred.IndexCache: IndexCache created with max
> >>>> memory = 10485760
> >>>> 15/09/08 14:35:24 INFO mapred.ShuffleHandler: httpshuffle listening on
> >>>> port 13562
> >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl:
> >>>> ContainerManager started at hadoopmapr5/192.168.0.96:55449
> >>>> 15/09/08 14:35:24 INFO containermanager.ContainerManagerImpl:
> >>>> ContainerManager bound to 0.0.0.0/0.0.0.0:0
> >>>> 15/09/08 14:35:24 INFO webapp.WebServer: Instantiating NMWebApp at
> >>>> 0.0.0.0:8042
> >>>> 15/09/08 14:35:24 INFO mortbay.log: Logging to
> >>>> org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
> >>>> org.mortbay.log.Slf4jLog
> >>>> 15/09/08 14:35:24 INFO http.HttpRequestLog: Http request log for
> >>>> http.requests.nodemanager is not defined
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added global filter 'safety'
> >>>> (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter)
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter
> >>>> static_user_filter
> >>>>
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
> >>>> to context node
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter
> >>>> static_user_filter
> >>>>
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
> >>>> to context static
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Added filter
> >>>> static_user_filter
> >>>>
> (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter)
> >>>> to context logs
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /node/*
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: adding path spec: /ws/*
> >>>> 15/09/08 14:35:24 INFO http.HttpServer2: Jetty bound to port 8042
> >>>> 15/09/08 14:35:24 INFO mortbay.log: jetty-6.1.26
> >>>> 15/09/08 14:35:24 INFO mortbay.log: Extract
> >>>>
> >>>>
> jar:file:/tmp/mesos/slaves/20150907-111332-1660987584-5050-8033-S3/frameworks/20150907-111332-1660987584-5050-8033-0003/executors/myriad_executor20150907-111332-1660987584-5050-8033-000320150907-111332-1660987584-5050-8033-O11824820150907-111332-1660987584-5050-8033-S3/runs/67cc8f37-b6d4-4018-a9b4-0071d020c9a5/hadoop-2.7.0/share/hadoop/yarn/hadoop-yarn-common-2.7.0-mapr-1506.jar!/webapps/node
> >>>> to /tmp/Jetty_0_0_0_0_8042_node____19tj0x/webapp
> >>>> 15/09/08 14:35:25 INFO mortbay.log: Started
> >>>> [email protected]:8042
> >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Web app /node started at 8042
> >>>> 15/09/08 14:35:25 INFO webapp.WebApps: Registered webapp guice modules
> >>>> 15/09/08 14:35:25 INFO client.RMProxy: Connecting to ResourceManager
> >>>> at myriad.marathon.mesos/192.168.0.99:8031
> >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Sending out
> >>>> 0 NM container statuses: []
> >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Registering
> >>>> with RM using containers :[]
> >>>> 15/09/08 14:35:25 INFO security.NMContainerTokenSecretManager: Rolling
> >>>> master-key for container-tokens, got key with id 338249572
> >>>> 15/09/08 14:35:25 INFO security.NMTokenSecretManagerInNM: Rolling
> >>>> master-key for container-tokens, got key with id -362725484
> >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Registered
> >>>> with ResourceManager as hadoopmapr5.brewingintel.com:55449 with total
> >>>> resource of <memory:16384, vCores:4, disks:4.0>
> >>>> 15/09/08 14:35:25 INFO nodemanager.NodeStatusUpdaterImpl: Notifying
> >>>> ContainerManager to unblock new container-requests
> >>>>
> >>>>
> >>>> Except of Myriad logs:
> >>>>
> >>>> /09/08 14:35:12 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:13 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:15 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:16 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:17 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:18 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:19 INFO handlers.StatusUpdateEventHandler: Status
> >>>> Update for task: value:
> >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf"
> >>>>  | state: TASK_FAILED
> >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using
> >>>> remote distribution
> >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl:
> >>>> Getting Hadoop distribution
> >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz
> >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl:
> >>>> Getting config from:http://myriad.marathon.mesos:8088/conf
> >>>> 15/09/08 14:35:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: Slave
> >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo chown
> >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export
> >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env
> >>>> YARN_HOME="hadoop-2.7.0"
> >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0
> >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos
> >>>>
> >>>>
> -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> >>>> -Dnodemanager.resource.cpu-vcores=4
> >>>> -Dnodemanager.resource.memory-mb=16384
> >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000
> >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001
> >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002
> >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003"  $YARN_HOME/bin/yarn
> >>>> nodemanager
> >>>> 15/09/08 14:35:19 INFO handlers.ResourceOffersEventHandler: Launching
> >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer:
> >>>> value: "20150907-111332-1660987584-5050-8033-O118248"
> >>>>
> >>>> 15/09/08 14:35:20 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>> 15/09/08 14:35:21 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:21 INFO util.AbstractLivelinessMonitor:
> >>>> Expired:hadoopmapr5.brewingintel.com:52878 Timed out after 2 secs
> >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl: Deactivating Node
> >>>> hadoopmapr5.brewingintel.com:52878 as it is now LOST
> >>>> 15/09/08 14:35:21 INFO rmnode.RMNodeImpl:
> >>>> hadoopmapr5.brewingintel.com:52878 Node Transitioned from RUNNING to
> >>>> LOST
> >>>> 15/09/08 14:35:21 INFO fair.FairScheduler: Removed node
> >>>> hadoopmapr5.brewingintel.com:52878 cluster capacity: <memory:0,
> >>>> vCores:0, disks:0.0>
> >>>> 15/09/08 14:35:22 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:23 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:25 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:25 INFO util.RackResolver: Resolved
> >>>> hadoopmapr5.brewingintel.com to /default-rack
> >>>> 15/09/08 14:35:25 INFO resourcemanager.ResourceTrackerService:
> >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 55449
> >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4,
> >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:55449
> >>>> 15/09/08 14:35:25 INFO rmnode.RMNodeImpl:
> >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from NEW to
> >>>> RUNNING
> >>>> 15/09/08 14:35:25 INFO fair.FairScheduler: Added node
> >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:16384,
> >>>> vCores:4, disks:4.0>
> >>>> 15/09/08 14:35:26 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:27 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:28 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:30 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:31 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:32 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:33 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:35 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:36 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:37 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:38 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:40 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:41 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:42 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:43 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:45 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:46 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:47 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:48 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:50 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:51 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:52 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:53 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:55 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:35:56 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:57 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:35:58 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:00 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:36:01 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:02 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:03 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:05 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:36:06 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:07 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:08 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:10 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:36:11 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:12 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:13 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:15 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 3
> >>>> 15/09/08 14:36:16 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:17 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:18 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:19 INFO handlers.StatusUpdateEventHandler: Status
> >>>> Update for task: value:
> >>>> "nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf"
> >>>>  | state: TASK_FAILED
> >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:19 INFO scheduler.DownloadNMExecutorCLGenImpl: Using
> >>>> remote distribution
> >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl:
> >>>> Getting Hadoop distribution
> >>>> from:maprfs:///mesos/myriad/hadoop-2.7.0.tar.gz
> >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl:
> >>>> Getting config from:http://myriad.marathon.mesos:8088/conf
> >>>> 15/09/08 14:36:19 INFO scheduler.TaskFactory$NMTaskFactoryImpl: Slave
> >>>> will execute command:sudo tar -zxpf hadoop-2.7.0.tar.gz && sudo chown
> >>>> mapr . && cp conf hadoop-2.7.0/etc/hadoop/yarn-site.xml; export
> >>>> YARN_HOME=hadoop-2.7.0; sudo -E -u mapr -H env
> >>>> YARN_HOME="hadoop-2.7.0"
> >>>> YARN_NODEMANAGER_OPTS="-Dnodemanager.resource.io-spindles=4.0
> >>>> -Dyarn.resourcemanager.hostname=myriad.marathon.mesos
> >>>>
> >>>>
> -Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
> >>>> -Dnodemanager.resource.cpu-vcores=4
> >>>> -Dnodemanager.resource.memory-mb=16384
> >>>> -Dmyriad.yarn.nodemanager.address=0.0.0.0:31000
> >>>> -Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001
> >>>> -Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002
> >>>> -Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003"  $YARN_HOME/bin/yarn
> >>>> nodemanager
> >>>> 15/09/08 14:36:19 INFO handlers.ResourceOffersEventHandler: Launching
> >>>> task: nm.medium.323f6664-11ca-477b-9e6e-41fb7547eacf using offer:
> >>>> value: "20150907-111332-1660987584-5050-8033-O118392"
> >>>>
> >>>> 15/09/08 14:36:20 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>> 15/09/08 14:36:20 INFO util.AbstractLivelinessMonitor:
> >>>> Expired:hadoopmapr5.brewingintel.com:55449 Timed out after 2 secs
> >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl: Deactivating Node
> >>>> hadoopmapr5.brewingintel.com:55449 as it is now LOST
> >>>> 15/09/08 14:36:20 INFO rmnode.RMNodeImpl:
> >>>> hadoopmapr5.brewingintel.com:55449 Node Transitioned from RUNNING to
> >>>> LOST
> >>>> 15/09/08 14:36:20 INFO fair.FairScheduler: Removed node
> >>>> hadoopmapr5.brewingintel.com:55449 cluster capacity: <memory:0,
> >>>> vCores:0, disks:0.0>
> >>>> 15/09/08 14:36:22 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>> 15/09/08 14:36:23 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:24 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:25 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>> 15/09/08 14:36:25 INFO util.RackResolver: Resolved
> >>>> hadoopmapr5.brewingintel.com to /default-rack
> >>>> 15/09/08 14:36:25 INFO resourcemanager.ResourceTrackerService:
> >>>> NodeManager from node hadoopmapr5.brewingintel.com(cmPort: 40378
> >>>> httpPort: 8042) registered with capability: <memory:16384, vCores:4,
> >>>> disks:4.0>, assigned nodeId hadoopmapr5.brewingintel.com:40378
> >>>> 15/09/08 14:36:25 INFO rmnode.RMNodeImpl:
> >>>> hadoopmapr5.brewingintel.com:40378 Node Transitioned from NEW to
> >>>> RUNNING
> >>>> 15/09/08 14:36:25 INFO fair.FairScheduler: Added node
> >>>> hadoopmapr5.brewingintel.com:40378 cluster capacity: <memory:16384,
> >>>> vCores:4, disks:4.0>
> >>>> 15/09/08 14:36:27 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>> 15/09/08 14:36:28 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:29 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 1
> >>>> 15/09/08 14:36:30 INFO handlers.ResourceOffersEventHandler: Received
> >>>> offers 2
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>
> >
>
>

Re: Getting Nodes to be "Running" in Mesos

Reply via email to