I tried following to try out docker from myriad rc2, but couldn't past
after RM is launched and not able to launch NMs.

Here is the formatted output for the RM docker launch:

root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker run
--net=host -v $PWD/dist -v $PWD/config:/usr/local/hadoop/etc/hadoop
--name='myriad-resourcemanager' -t sarjeet/myriad

2016-05-23 03:38:21,431 INFO  [main] myriad.Main
(Main.java:initHealthChecks(140)) - Initializing HealthChecks

2016-05-23 03:38:21,445 INFO  [main] myriad.Main
(Main.java:initProfiles(148)) - Initializing Profiles

2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
(ServiceProfileManager.java:add(40)) - Adding profile zero with CPU: 0.0
and Memory: 0.0

2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
(ServiceProfileManager.java:add(40)) - Adding profile small with CPU: 2.0
and Memory: 2048.0

2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
(ServiceProfileManager.java:add(40)) - Adding profile medium with CPU: 4.0
and Memory: 4096.0

2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
(ServiceProfileManager.java:add(40)) - Adding profile large with CPU: 10.0
and Memory: 12288.0

2016-05-23 03:38:21,451 INFO  [main] myriad.Main
(Main.java:validateNMInstances(175)) - Validating nmInstances..

2016-05-23 03:38:21,451 INFO  [main] myriad.Main
(Main.java:initServiceConfigurations(238)) - Initializing
initServiceConfigurations

2016-05-23 03:38:21,534 INFO  [main] myriad.Main
(Main.java:startMesosDriver(119)) - starting mesosDriver..

2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriverManager
(MyriadDriverManager.java:startDriver(51)) - Starting driver...

2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriver
(MyriadDriver.java:start(49)) - Starting driver

2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@712: Client
environment:zookeeper.version=zookeeper C client 3.4.5

2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@716: Client
environment:host.name=qa101-139

2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@723: Client
environment:os.name=Linux

2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@724: Client
environment:os.arch=3.13.0-57-generic

2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@725: Client
environment:os.version=#95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015

I0523 03:38:21.535646    24 sched.cpp:222] Version: 0.28.1

2016-05-23 03:38:21,535 INFO  [main] scheduler.MyriadDriver
(MyriadDriver.java:start(51)) - Driver started with status: DRIVER_RUNNING

2016-05-23 03:38:21,536 INFO  [main] scheduler.MyriadDriverManager
(MyriadDriverManager.java:startDriver(53)) - Driver started with status:
DRIVER_RUNNING

2016-05-23 03:38:21,536 INFO  [main] myriad.Main
(Main.java:startMesosDriver(121)) - started mesosDriver..

2016-05-23 03:38:21,536:7(0x7f4e8d685700):ZOO_INFO@check_events@1703:
initiated connection to server [10.10.101.139:5181]

2016-05-23 03:38:21,536 INFO  [main] interceptor.CompositeInterceptor
(CompositeInterceptor.java:register(74)) - Registered
org.apache.myriad.policy.LeastAMNodesFirstPolicy into the registry.

2016-05-23 03:38:21,539 INFO  [main] myriad.Main
(Main.java:startNMInstances(226)) - Launching 1 NM(s) with profile medium

2016-05-23 03:38:21,540 INFO  [main] scheduler.MyriadOperations
(MyriadOperations.java:flexUpCluster(80)) - Adding 1 NM instances to cluster

2016-05-23 03:38:21,555:7(0x7f4e8d685700):ZOO_INFO@check_events@1750:
session establishment complete on server [10.10.101.139:5181],
sessionId=0x15314ddb816d02b, negotiated timeout=10000

I0523 03:38:21.555871    97 group.cpp:349] Group process (group(1)@
10.10.101.139:57196) connected to ZooKeeper

I0523 03:38:21.555945    97 group.cpp:831] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)

I0523 03:38:21.555979    97 group.cpp:427] Trying to create path '/mesos'
in ZooKeeper

I0523 03:38:21.557073    98 detector.cpp:152] Detected a new leader:
(id='1')

I0523 03:38:21.557235    75 group.cpp:700] Trying to get
'/mesos/json.info_0000000001' in ZooKeeper

I0523 03:38:21.558002    97 detector.cpp:479] A new leading master (UPID=
master@10.10.101.139:5050) is detected

I0523 03:38:21.558116    76 sched.cpp:326] New master detected at
master@10.10.101.139:5050

I0523 03:38:21.558442    76 sched.cpp:336] No credentials provided.
Attempting to register without authentication

I0523 03:38:21.559672    93 sched.cpp:703] Framework registered with
8114e114-db5f-4faa-afb3-ba1ae29e6368-0023

2016-05-23 03:38:21,630 INFO  [pool-2-thread-1]
handlers.RegisteredEventHandler (RegisteredEventHandler.java:onEvent(41)) -
Received event: org.apache.myriad.scheduler.event.RegisteredEvent@27492832
with frameworkId: value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-0023"

2016-05-23 03:38:21,691 INFO  [main] state.SchedulerState
(SchedulerState.java:addNodes(77)) - Marked taskId
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 pending, size of pending
queue for nm is: 0

2016-05-23 03:38:21,702 INFO  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1

2016-05-23 03:38:21,702 INFO  [main] scheduler.MyriadOperations
(MyriadOperations.java:flexUpAService(139)) - Adding 1 jobhistory instances
to cluster

2016-05-23 03:38:21,708 INFO  [main] state.SchedulerState
(SchedulerState.java:addNodes(77)) - Marked taskId
jobhistory.jobhistory.78b0a6a2-a47d-412f-869d-6fd9872ef85a pending, size of
pending queue for jobhistory is: 0

2016-05-23 03:38:21,713 INFO  [main]
interceptor.MyriadInitializationInterceptor
(MyriadInitializationInterceptor.java:init(54)) - Initialized myriad.

2016-05-23 03:38:21,714 WARN  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource
type: dfsio_spindles

2016-05-23 03:38:21,724 INFO  [pool-2-thread-3]
scheduler.TaskFactory$NMTaskFactoryImpl
(TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from:
http://172.17.0.2:8192/api/config.tgz

2016-05-23 03:38:21,749 INFO  [main] ipc.CallQueueManager
(CallQueueManager.java:<init>(53)) - Using callQueue class
java.util.concurrent.LinkedBlockingQueue

2016-05-23 03:38:21,758 INFO  [Socket Reader #1 for port 8031] ipc.Server
(Server.java:run(606)) - Starting Socket Reader #1 for port 8031

2016-05-23 03:38:21,774 INFO  [main] pb.RpcServerFactoryPBImpl
(RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol
org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server

2016-05-23 03:38:21,775 INFO  [IPC Server Responder] ipc.Server
(Server.java:run(836)) - IPC Server Responder: starting

2016-05-23 03:38:21,775 INFO  [IPC Server listener on 8031] ipc.Server
(Server.java:run(676)) - IPC Server listener on 8031: starting

2016-05-23 03:38:21,786 INFO  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:onEvent(139)) - Launching task:
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value:
"8114e114-db5f-4faa-afb3-ba1ae29e6368-O98094"


2016-05-23 03:38:21,802 INFO  [main] ipc.CallQueueManager
(CallQueueManager.java:<init>(53)) - Using callQueue class
java.util.concurrent.LinkedBlockingQueue

2016-05-23 03:38:21,803 INFO  [pool-2-thread-9]
handlers.ExecutorLostEventHandler
(ExecutorLostEventHandler.java:onEvent(39)) - Executor value:
"myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980948114e114-db5f-4faa-afb3-ba1ae29e6368-S1"

 of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1"

 lost with exit status: -1

2016-05-23 03:38:21,809 INFO  [Socket Reader #1 for port 8030] ipc.Server
(Server.java:run(606)) - Starting Socket Reader #1 for port 8030

2016-05-23 03:38:21,824 INFO  [pool-2-thread-5]
handlers.StatusUpdateEventHandler
(StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task:
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED

2016-05-23 03:38:21,852 INFO  [main] pb.RpcServerFactoryPBImpl
(RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol
org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server

2016-05-23 03:38:21,853 INFO  [IPC Server Responder] ipc.Server
(Server.java:run(836)) - IPC Server Responder: starting

2016-05-23 03:38:21,853 INFO  [IPC Server listener on 8030] ipc.Server
(Server.java:run(676)) - IPC Server listener on 8030: starting

2016-05-23 03:38:22,504 INFO  [IPC Server listener on 8033] ipc.Server
(Server.java:run(676)) - IPC Server listener on 8033: starting

2016-05-23 03:38:22,755 INFO  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1

2016-05-23 03:38:22,755 WARN  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource
type: dfsio_spindles

2016-05-23 03:38:22,756 INFO  [pool-2-thread-3]
scheduler.TaskFactory$NMTaskFactoryImpl
(TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from:
http://172.17.0.2:8192/api/config.tgz

2016-05-23 03:38:22,757 INFO  [pool-2-thread-3]
handlers.ResourceOffersEventHandler
(ResourceOffersEventHandler.java:onEvent(139)) - Launching task:
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value:
"8114e114-db5f-4faa-afb3-ba1ae29e6368-O98095"


2016-05-23 03:38:22,768 INFO  [pool-2-thread-9]
handlers.ExecutorLostEventHandler
(ExecutorLostEventHandler.java:onEvent(39)) - Executor value:
"myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980958114e114-db5f-4faa-afb3-ba1ae29e6368-S1"

 of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1"

 lost with exit status: -1

2016-05-23 03:38:22,773 INFO  [pool-2-thread-5]
handlers.StatusUpdateEventHandler
(StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task:
nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED

====================================

To debug above failure, I checked the NM task stdout/stderr but failed to
get any logs (see screenshot attached)

Then, On attaching to the running containers, found following from the
docker container:

root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker exec -it
8f730dc2de1a /bin/bash

yarn@qa101-139:/$

yarn@qa101-139:/$ ps -ef

UID        PID  PPID  C STIME TTY          TIME CMD

yarn         1     0  0 03:38 ?        00:00:00 /bin/sh -c
/usr/local/hadoop/bin/yarn resourcemanager

yarn         7     1 14 03:38 ?        00:00:14 /usr/bin/java
-Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/local/hadoop/logs
-Dyarn.log.dir=/usr/l

yarn       296     0  0 03:39 ?        00:00:00 /bin/bash

yarn       302     0  0 03:39 ?        00:00:00 /bin/bash

yarn       309   302  0 03:39 ?        00:00:00 ps -ef

yarn@qa101-139:/$ ls -l /usr/local/hadoop/

total 52

-rw-r--r-- 1 root root 15429 Apr 10  2015 LICENSE.txt

-rw-r--r-- 1 root root   101 Apr 10  2015 NOTICE.txt

-rw-r--r-- 1 root root  1366 Apr 10  2015 README.txt

drwxr-xr-x 2 root root  4096 May 22 19:22 bin

drwxr-xr-x 6 root root  4096 May 22 20:42 etc

drwxr-xr-x 2 root root  4096 May 22 19:22 include

drwxr-xr-x 4 root root  4096 May 22 19:22 lib

drwxr-xr-x 2 root root  4096 May 22 19:22 libexec

drwxr-xr-x 2 root root  4096 May 22 19:22 sbin

drwxr-xr-x 7 root root  4096 May 22 20:42 share

yarn@qa101-139:/$ ls -l /usr/local/hadoop/etc/hadoop/

total 16

-rw-r--r-- 1 root root 1340 May 22 20:02 mapred-site.xml

-rw-r--r-- 1 root root 3395 May 23 03:38 myriad-config-default.yml

-rw-r--r-- 1 root root 4207 May 23 00:27 yarn-site.xml

yarn@qa101-139:/$ cat
/usr/local/hadoop/etc/hadoop/myriad-config-default.yml

mesosMaster: zk://10.10.101.139:5181/mesos   ->> (Running on the host
outside of the container)

#Container information for the node managers

containerInfo:

    type: DOCKER

    dockerInfo:

        image: sarjeet/myriad

    volumes:

        -

          containerPath: /tmp

          hostPath: /tmp

checkpoint: false

frameworkFailoverTimeout: 43200000

frameworkName: MyriadAlpha

frameworkRole:

frameworkUser: mapr

                          # running the resource manager.

frameworkSuperUser: root  # To be depricated, currently permissions need
set by a superuser due to Mesos-1790.  Must be

                          # root or have passwordless sudo. Required if
nodeManagerURI set, ignored otherwise.

nativeLibrary: /usr/local/lib/libmesos.so

zkServers: 10.10.101.139:5181

zkTimeout: 20000

restApiPort: 8192

profiles:

  zero:  # NMs launched with this profile dynamically obtain cpu/mem from
Mesos

    cpu: 0

    mem: 0

    spindles: 0

  small:

    cpu: 2

    mem: 2048

    spindles: 1

  medium:

    cpu: 4

    mem: 4096

    spindles: 2

  large:

    cpu: 10

    mem: 12288

    spindles: 4

nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
profile.

  medium: 1 # <profile_name : instances>

rebalancer: false

haEnabled: true

servedConfigPath: /dist/config.tgz

nodemanager:

  jvmMaxMemoryMB: 1024

  cpus: 0.2

  cgroups: true

executor:

  jvmMaxMemoryMB: 256

  configUri: http://172.17.0.1:8192/api/config.tgz

  path:
file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar

  #The following should be used for a remotely distributed URI, hdfs
assumed but other URI types valid.

  #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz

  #path:
file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar

yarnEnvironment:

  YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1
-Dnodemanager.resource.io-spindles=4.0

  YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0

  HADOOP_CONF_DIR: /mnt/mesos/sandbox/config

  HADOOP_TMP_DIR: /tmp

  HADOOP_LOG_DIR: /mnt/mesos/sandbox

  #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes
necessary

mesosAuthenticationPrincipal:

mesosAuthenticationSecretFilename:

yarn@qa101-139:/$ netstat -anlp | grep 8088

tcp6       0      0 10.10.101.139:8088      :::*                    LISTEN
    7/java

yarn@qa101-139:/$ netstat -anlp | grep 8192

tcp6       0      0 :::8192                 :::*                    LISTEN
    7/java

yarn@qa101-139:/$

====================================

Reference: https://github.com/apache/incubator-myriad/tree/master/docker

I might not have looked deeper enough to see if there was any configuration
issue on launching docker RM, but in case, there is a trivial fix or config
I missed, I can give this another try. Let me know if there was anything I
missed?
- Sarjeet Singh

Reply via email to