AFAIK, I might have few configuration and setup issue related to docker issues. I'll still look into it more once I have some more time. But overall, RC2 looks good and docker issue is not a blocker as such.
Let me know if anyone else is able to try docker stuff successfully and willing to share the story :). -Sarjeet On Sun, May 22, 2016 at 8:54 PM, sarjeet singh <ssarjeetsi...@gmail.com> wrote: > I tried following to try out docker from myriad rc2, but couldn't past > after RM is launched and not able to launch NMs. > > Here is the formatted output for the RM docker launch: > > root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker run > --net=host -v $PWD/dist -v $PWD/config:/usr/local/hadoop/etc/hadoop > --name='myriad-resourcemanager' -t sarjeet/myriad > > 2016-05-23 03:38:21,431 INFO [main] myriad.Main > (Main.java:initHealthChecks(140)) - Initializing HealthChecks > > 2016-05-23 03:38:21,445 INFO [main] myriad.Main > (Main.java:initProfiles(148)) - Initializing Profiles > > 2016-05-23 03:38:21,450 INFO [main] scheduler.ServiceProfileManager > (ServiceProfileManager.java:add(40)) - Adding profile zero with CPU: 0.0 > and Memory: 0.0 > > 2016-05-23 03:38:21,450 INFO [main] scheduler.ServiceProfileManager > (ServiceProfileManager.java:add(40)) - Adding profile small with CPU: 2.0 > and Memory: 2048.0 > > 2016-05-23 03:38:21,450 INFO [main] scheduler.ServiceProfileManager > (ServiceProfileManager.java:add(40)) - Adding profile medium with CPU: 4.0 > and Memory: 4096.0 > > 2016-05-23 03:38:21,450 INFO [main] scheduler.ServiceProfileManager > (ServiceProfileManager.java:add(40)) - Adding profile large with CPU: 10.0 > and Memory: 12288.0 > > 2016-05-23 03:38:21,451 INFO [main] myriad.Main > (Main.java:validateNMInstances(175)) - Validating nmInstances.. > > 2016-05-23 03:38:21,451 INFO [main] myriad.Main > (Main.java:initServiceConfigurations(238)) - Initializing > initServiceConfigurations > > 2016-05-23 03:38:21,534 INFO [main] myriad.Main > (Main.java:startMesosDriver(119)) - starting mesosDriver.. > > 2016-05-23 03:38:21,534 INFO [main] scheduler.MyriadDriverManager > (MyriadDriverManager.java:startDriver(51)) - Starting driver... > > 2016-05-23 03:38:21,534 INFO [main] scheduler.MyriadDriver > (MyriadDriver.java:start(49)) - Starting driver > > 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@712: Client > environment:zookeeper.version=zookeeper C client 3.4.5 > > 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@716: Client > environment:host.name=qa101-139 > > 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@723: Client > environment:os.name=Linux > > 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@724: Client > environment:os.arch=3.13.0-57-generic > > 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@725: Client > environment:os.version=#95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015 > > I0523 03:38:21.535646 24 sched.cpp:222] Version: 0.28.1 > > 2016-05-23 03:38:21,535 INFO [main] scheduler.MyriadDriver > (MyriadDriver.java:start(51)) - Driver started with status: DRIVER_RUNNING > > 2016-05-23 03:38:21,536 INFO [main] scheduler.MyriadDriverManager > (MyriadDriverManager.java:startDriver(53)) - Driver started with status: > DRIVER_RUNNING > > 2016-05-23 03:38:21,536 INFO [main] myriad.Main > (Main.java:startMesosDriver(121)) - started mesosDriver.. > > 2016-05-23 03:38:21,536:7(0x7f4e8d685700):ZOO_INFO@check_events@1703: > initiated connection to server [10.10.101.139:5181] > > 2016-05-23 03:38:21,536 INFO [main] interceptor.CompositeInterceptor > (CompositeInterceptor.java:register(74)) - Registered > org.apache.myriad.policy.LeastAMNodesFirstPolicy into the registry. > > 2016-05-23 03:38:21,539 INFO [main] myriad.Main > (Main.java:startNMInstances(226)) - Launching 1 NM(s) with profile medium > > 2016-05-23 03:38:21,540 INFO [main] scheduler.MyriadOperations > (MyriadOperations.java:flexUpCluster(80)) - Adding 1 NM instances to cluster > > 2016-05-23 03:38:21,555:7(0x7f4e8d685700):ZOO_INFO@check_events@1750: > session establishment complete on server [10.10.101.139:5181], > sessionId=0x15314ddb816d02b, negotiated timeout=10000 > > I0523 03:38:21.555871 97 group.cpp:349] Group process (group(1)@ > 10.10.101.139:57196) connected to ZooKeeper > > I0523 03:38:21.555945 97 group.cpp:831] Syncing group operations: queue > size (joins, cancels, datas) = (0, 0, 0) > > I0523 03:38:21.555979 97 group.cpp:427] Trying to create path '/mesos' > in ZooKeeper > > I0523 03:38:21.557073 98 detector.cpp:152] Detected a new leader: > (id='1') > > I0523 03:38:21.557235 75 group.cpp:700] Trying to get > '/mesos/json.info_0000000001' in ZooKeeper > > I0523 03:38:21.558002 97 detector.cpp:479] A new leading master (UPID= > master@10.10.101.139:5050) is detected > > I0523 03:38:21.558116 76 sched.cpp:326] New master detected at > master@10.10.101.139:5050 > > I0523 03:38:21.558442 76 sched.cpp:336] No credentials provided. > Attempting to register without authentication > > I0523 03:38:21.559672 93 sched.cpp:703] Framework registered with > 8114e114-db5f-4faa-afb3-ba1ae29e6368-0023 > > 2016-05-23 03:38:21,630 INFO [pool-2-thread-1] > handlers.RegisteredEventHandler (RegisteredEventHandler.java:onEvent(41)) - > Received event: org.apache.myriad.scheduler.event.RegisteredEvent@27492832 > with frameworkId: value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-0023" > > 2016-05-23 03:38:21,691 INFO [main] state.SchedulerState > (SchedulerState.java:addNodes(77)) - Marked taskId > nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 pending, size of pending > queue for nm is: 0 > > 2016-05-23 03:38:21,702 INFO [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1 > > 2016-05-23 03:38:21,702 INFO [main] scheduler.MyriadOperations > (MyriadOperations.java:flexUpAService(139)) - Adding 1 jobhistory instances > to cluster > > 2016-05-23 03:38:21,708 INFO [main] state.SchedulerState > (SchedulerState.java:addNodes(77)) - Marked taskId > jobhistory.jobhistory.78b0a6a2-a47d-412f-869d-6fd9872ef85a pending, size of > pending queue for jobhistory is: 0 > > 2016-05-23 03:38:21,713 INFO [main] > interceptor.MyriadInitializationInterceptor > (MyriadInitializationInterceptor.java:init(54)) - Initialized myriad. > > 2016-05-23 03:38:21,714 WARN [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource > type: dfsio_spindles > > 2016-05-23 03:38:21,724 INFO [pool-2-thread-3] > scheduler.TaskFactory$NMTaskFactoryImpl > (TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from: > http://172.17.0.2:8192/api/config.tgz > > 2016-05-23 03:38:21,749 INFO [main] ipc.CallQueueManager > (CallQueueManager.java:<init>(53)) - Using callQueue class > java.util.concurrent.LinkedBlockingQueue > > 2016-05-23 03:38:21,758 INFO [Socket Reader #1 for port 8031] ipc.Server > (Server.java:run(606)) - Starting Socket Reader #1 for port 8031 > > 2016-05-23 03:38:21,774 INFO [main] pb.RpcServerFactoryPBImpl > (RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol > org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server > > 2016-05-23 03:38:21,775 INFO [IPC Server Responder] ipc.Server > (Server.java:run(836)) - IPC Server Responder: starting > > 2016-05-23 03:38:21,775 INFO [IPC Server listener on 8031] ipc.Server > (Server.java:run(676)) - IPC Server listener on 8031: starting > > 2016-05-23 03:38:21,786 INFO [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:onEvent(139)) - Launching task: > nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value: > "8114e114-db5f-4faa-afb3-ba1ae29e6368-O98094" > > > 2016-05-23 03:38:21,802 INFO [main] ipc.CallQueueManager > (CallQueueManager.java:<init>(53)) - Using callQueue class > java.util.concurrent.LinkedBlockingQueue > > 2016-05-23 03:38:21,803 INFO [pool-2-thread-9] > handlers.ExecutorLostEventHandler > (ExecutorLostEventHandler.java:onEvent(39)) - Executor value: > "myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980948114e114-db5f-4faa-afb3-ba1ae29e6368-S1" > > of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1" > > lost with exit status: -1 > > 2016-05-23 03:38:21,809 INFO [Socket Reader #1 for port 8030] ipc.Server > (Server.java:run(606)) - Starting Socket Reader #1 for port 8030 > > 2016-05-23 03:38:21,824 INFO [pool-2-thread-5] > handlers.StatusUpdateEventHandler > (StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task: > nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED > > 2016-05-23 03:38:21,852 INFO [main] pb.RpcServerFactoryPBImpl > (RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol > org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server > > 2016-05-23 03:38:21,853 INFO [IPC Server Responder] ipc.Server > (Server.java:run(836)) - IPC Server Responder: starting > > 2016-05-23 03:38:21,853 INFO [IPC Server listener on 8030] ipc.Server > (Server.java:run(676)) - IPC Server listener on 8030: starting > > 2016-05-23 03:38:22,504 INFO [IPC Server listener on 8033] ipc.Server > (Server.java:run(676)) - IPC Server listener on 8033: starting > > 2016-05-23 03:38:22,755 INFO [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1 > > 2016-05-23 03:38:22,755 WARN [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource > type: dfsio_spindles > > 2016-05-23 03:38:22,756 INFO [pool-2-thread-3] > scheduler.TaskFactory$NMTaskFactoryImpl > (TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from: > http://172.17.0.2:8192/api/config.tgz > > 2016-05-23 03:38:22,757 INFO [pool-2-thread-3] > handlers.ResourceOffersEventHandler > (ResourceOffersEventHandler.java:onEvent(139)) - Launching task: > nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value: > "8114e114-db5f-4faa-afb3-ba1ae29e6368-O98095" > > > 2016-05-23 03:38:22,768 INFO [pool-2-thread-9] > handlers.ExecutorLostEventHandler > (ExecutorLostEventHandler.java:onEvent(39)) - Executor value: > "myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980958114e114-db5f-4faa-afb3-ba1ae29e6368-S1" > > of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1" > > lost with exit status: -1 > > 2016-05-23 03:38:22,773 INFO [pool-2-thread-5] > handlers.StatusUpdateEventHandler > (StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task: > nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED > > ==================================== > > To debug above failure, I checked the NM task stdout/stderr but failed to > get any logs (see screenshot attached) > > Then, On attaching to the running containers, found following from the > docker container: > > root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker exec > -it 8f730dc2de1a /bin/bash > > yarn@qa101-139:/$ > > yarn@qa101-139:/$ ps -ef > > UID PID PPID C STIME TTY TIME CMD > > yarn 1 0 0 03:38 ? 00:00:00 /bin/sh -c > /usr/local/hadoop/bin/yarn resourcemanager > > yarn 7 1 14 03:38 ? 00:00:14 /usr/bin/java > -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/local/hadoop/logs > -Dyarn.log.dir=/usr/l > > yarn 296 0 0 03:39 ? 00:00:00 /bin/bash > > yarn 302 0 0 03:39 ? 00:00:00 /bin/bash > > yarn 309 302 0 03:39 ? 00:00:00 ps -ef > > yarn@qa101-139:/$ ls -l /usr/local/hadoop/ > > total 52 > > -rw-r--r-- 1 root root 15429 Apr 10 2015 LICENSE.txt > > -rw-r--r-- 1 root root 101 Apr 10 2015 NOTICE.txt > > -rw-r--r-- 1 root root 1366 Apr 10 2015 README.txt > > drwxr-xr-x 2 root root 4096 May 22 19:22 bin > > drwxr-xr-x 6 root root 4096 May 22 20:42 etc > > drwxr-xr-x 2 root root 4096 May 22 19:22 include > > drwxr-xr-x 4 root root 4096 May 22 19:22 lib > > drwxr-xr-x 2 root root 4096 May 22 19:22 libexec > > drwxr-xr-x 2 root root 4096 May 22 19:22 sbin > > drwxr-xr-x 7 root root 4096 May 22 20:42 share > > yarn@qa101-139:/$ ls -l /usr/local/hadoop/etc/hadoop/ > > total 16 > > -rw-r--r-- 1 root root 1340 May 22 20:02 mapred-site.xml > > -rw-r--r-- 1 root root 3395 May 23 03:38 myriad-config-default.yml > > -rw-r--r-- 1 root root 4207 May 23 00:27 yarn-site.xml > > yarn@qa101-139:/$ cat > /usr/local/hadoop/etc/hadoop/myriad-config-default.yml > > mesosMaster: zk://10.10.101.139:5181/mesos ->> (Running on the host > outside of the container) > > #Container information for the node managers > > containerInfo: > > type: DOCKER > > dockerInfo: > > image: sarjeet/myriad > > volumes: > > - > > containerPath: /tmp > > hostPath: /tmp > > checkpoint: false > > frameworkFailoverTimeout: 43200000 > > frameworkName: MyriadAlpha > > frameworkRole: > > frameworkUser: mapr > > # running the resource manager. > > frameworkSuperUser: root # To be depricated, currently permissions need > set by a superuser due to Mesos-1790. Must be > > # root or have passwordless sudo. Required if > nodeManagerURI set, ignored otherwise. > > nativeLibrary: /usr/local/lib/libmesos.so > > zkServers: 10.10.101.139:5181 > > zkTimeout: 20000 > > restApiPort: 8192 > > profiles: > > zero: # NMs launched with this profile dynamically obtain cpu/mem from > Mesos > > cpu: 0 > > mem: 0 > > spindles: 0 > > small: > > cpu: 2 > > mem: 2048 > > spindles: 1 > > medium: > > cpu: 4 > > mem: 4096 > > spindles: 2 > > large: > > cpu: 10 > > mem: 12288 > > spindles: 4 > > nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero > profile. > > medium: 1 # <profile_name : instances> > > rebalancer: false > > haEnabled: true > > servedConfigPath: /dist/config.tgz > > nodemanager: > > jvmMaxMemoryMB: 1024 > > cpus: 0.2 > > cgroups: true > > executor: > > jvmMaxMemoryMB: 256 > > configUri: http://172.17.0.1:8192/api/config.tgz > > path: > file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar > > #The following should be used for a remotely distributed URI, hdfs > assumed but other URI types valid. > > #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz > > #path: > file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar > > yarnEnvironment: > > YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1 > -Dnodemanager.resource.io-spindles=4.0 > > YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0 > > HADOOP_CONF_DIR: /mnt/mesos/sandbox/config > > HADOOP_TMP_DIR: /tmp > > HADOOP_LOG_DIR: /mnt/mesos/sandbox > > #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes > necessary > > mesosAuthenticationPrincipal: > > mesosAuthenticationSecretFilename: > > yarn@qa101-139:/$ netstat -anlp | grep 8088 > > tcp6 0 0 10.10.101.139:8088 :::* > LISTEN 7/java > > yarn@qa101-139:/$ netstat -anlp | grep 8192 > > tcp6 0 0 :::8192 :::* > LISTEN 7/java > > yarn@qa101-139:/$ > > ==================================== > > Reference: https://github.com/apache/incubator-myriad/tree/master/docker > > I might not have looked deeper enough to see if there was any > configuration issue on launching docker RM, but in case, there is a trivial > fix or config I missed, I can give this another try. Let me know if there > was anything I missed? > - Sarjeet Singh >