AFAIK, I might have few configuration and setup issue related to docker
issues. I'll still look into it more once I have some more time. But
overall, RC2 looks good and docker issue is not a blocker as such.

Let me know if anyone else is able to try docker stuff successfully and
willing to share the story :).

-Sarjeet

On Sun, May 22, 2016 at 8:54 PM, sarjeet singh <ssarjeetsi...@gmail.com>
wrote:

> I tried following to try out docker from myriad rc2, but couldn't past
> after RM is launched and not able to launch NMs.
>
> Here is the formatted output for the RM docker launch:
>
> root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker run
> --net=host -v $PWD/dist -v $PWD/config:/usr/local/hadoop/etc/hadoop
> --name='myriad-resourcemanager' -t sarjeet/myriad
>
> 2016-05-23 03:38:21,431 INFO  [main] myriad.Main
> (Main.java:initHealthChecks(140)) - Initializing HealthChecks
>
> 2016-05-23 03:38:21,445 INFO  [main] myriad.Main
> (Main.java:initProfiles(148)) - Initializing Profiles
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile zero with CPU: 0.0
> and Memory: 0.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile small with CPU: 2.0
> and Memory: 2048.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile medium with CPU: 4.0
> and Memory: 4096.0
>
> 2016-05-23 03:38:21,450 INFO  [main] scheduler.ServiceProfileManager
> (ServiceProfileManager.java:add(40)) - Adding profile large with CPU: 10.0
> and Memory: 12288.0
>
> 2016-05-23 03:38:21,451 INFO  [main] myriad.Main
> (Main.java:validateNMInstances(175)) - Validating nmInstances..
>
> 2016-05-23 03:38:21,451 INFO  [main] myriad.Main
> (Main.java:initServiceConfigurations(238)) - Initializing
> initServiceConfigurations
>
> 2016-05-23 03:38:21,534 INFO  [main] myriad.Main
> (Main.java:startMesosDriver(119)) - starting mesosDriver..
>
> 2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriverManager
> (MyriadDriverManager.java:startDriver(51)) - Starting driver...
>
> 2016-05-23 03:38:21,534 INFO  [main] scheduler.MyriadDriver
> (MyriadDriver.java:start(49)) - Starting driver
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@712: Client
> environment:zookeeper.version=zookeeper C client 3.4.5
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@716: Client
> environment:host.name=qa101-139
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@723: Client
> environment:os.name=Linux
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@724: Client
> environment:os.arch=3.13.0-57-generic
>
> 2016-05-23 03:38:21,535:7(0x7f4e9b3ac700):ZOO_INFO@log_env@725: Client
> environment:os.version=#95-Ubuntu SMP Fri Jun 19 09:28:15 UTC 2015
>
> I0523 03:38:21.535646    24 sched.cpp:222] Version: 0.28.1
>
> 2016-05-23 03:38:21,535 INFO  [main] scheduler.MyriadDriver
> (MyriadDriver.java:start(51)) - Driver started with status: DRIVER_RUNNING
>
> 2016-05-23 03:38:21,536 INFO  [main] scheduler.MyriadDriverManager
> (MyriadDriverManager.java:startDriver(53)) - Driver started with status:
> DRIVER_RUNNING
>
> 2016-05-23 03:38:21,536 INFO  [main] myriad.Main
> (Main.java:startMesosDriver(121)) - started mesosDriver..
>
> 2016-05-23 03:38:21,536:7(0x7f4e8d685700):ZOO_INFO@check_events@1703:
> initiated connection to server [10.10.101.139:5181]
>
> 2016-05-23 03:38:21,536 INFO  [main] interceptor.CompositeInterceptor
> (CompositeInterceptor.java:register(74)) - Registered
> org.apache.myriad.policy.LeastAMNodesFirstPolicy into the registry.
>
> 2016-05-23 03:38:21,539 INFO  [main] myriad.Main
> (Main.java:startNMInstances(226)) - Launching 1 NM(s) with profile medium
>
> 2016-05-23 03:38:21,540 INFO  [main] scheduler.MyriadOperations
> (MyriadOperations.java:flexUpCluster(80)) - Adding 1 NM instances to cluster
>
> 2016-05-23 03:38:21,555:7(0x7f4e8d685700):ZOO_INFO@check_events@1750:
> session establishment complete on server [10.10.101.139:5181],
> sessionId=0x15314ddb816d02b, negotiated timeout=10000
>
> I0523 03:38:21.555871    97 group.cpp:349] Group process (group(1)@
> 10.10.101.139:57196) connected to ZooKeeper
>
> I0523 03:38:21.555945    97 group.cpp:831] Syncing group operations: queue
> size (joins, cancels, datas) = (0, 0, 0)
>
> I0523 03:38:21.555979    97 group.cpp:427] Trying to create path '/mesos'
> in ZooKeeper
>
> I0523 03:38:21.557073    98 detector.cpp:152] Detected a new leader:
> (id='1')
>
> I0523 03:38:21.557235    75 group.cpp:700] Trying to get
> '/mesos/json.info_0000000001' in ZooKeeper
>
> I0523 03:38:21.558002    97 detector.cpp:479] A new leading master (UPID=
> master@10.10.101.139:5050) is detected
>
> I0523 03:38:21.558116    76 sched.cpp:326] New master detected at
> master@10.10.101.139:5050
>
> I0523 03:38:21.558442    76 sched.cpp:336] No credentials provided.
> Attempting to register without authentication
>
> I0523 03:38:21.559672    93 sched.cpp:703] Framework registered with
> 8114e114-db5f-4faa-afb3-ba1ae29e6368-0023
>
> 2016-05-23 03:38:21,630 INFO  [pool-2-thread-1]
> handlers.RegisteredEventHandler (RegisteredEventHandler.java:onEvent(41)) -
> Received event: org.apache.myriad.scheduler.event.RegisteredEvent@27492832
> with frameworkId: value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-0023"
>
> 2016-05-23 03:38:21,691 INFO  [main] state.SchedulerState
> (SchedulerState.java:addNodes(77)) - Marked taskId
> nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 pending, size of pending
> queue for nm is: 0
>
> 2016-05-23 03:38:21,702 INFO  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1
>
> 2016-05-23 03:38:21,702 INFO  [main] scheduler.MyriadOperations
> (MyriadOperations.java:flexUpAService(139)) - Adding 1 jobhistory instances
> to cluster
>
> 2016-05-23 03:38:21,708 INFO  [main] state.SchedulerState
> (SchedulerState.java:addNodes(77)) - Marked taskId
> jobhistory.jobhistory.78b0a6a2-a47d-412f-869d-6fd9872ef85a pending, size of
> pending queue for jobhistory is: 0
>
> 2016-05-23 03:38:21,713 INFO  [main]
> interceptor.MyriadInitializationInterceptor
> (MyriadInitializationInterceptor.java:init(54)) - Initialized myriad.
>
> 2016-05-23 03:38:21,714 WARN  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource
> type: dfsio_spindles
>
> 2016-05-23 03:38:21,724 INFO  [pool-2-thread-3]
> scheduler.TaskFactory$NMTaskFactoryImpl
> (TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from:
> http://172.17.0.2:8192/api/config.tgz
>
> 2016-05-23 03:38:21,749 INFO  [main] ipc.CallQueueManager
> (CallQueueManager.java:<init>(53)) - Using callQueue class
> java.util.concurrent.LinkedBlockingQueue
>
> 2016-05-23 03:38:21,758 INFO  [Socket Reader #1 for port 8031] ipc.Server
> (Server.java:run(606)) - Starting Socket Reader #1 for port 8031
>
> 2016-05-23 03:38:21,774 INFO  [main] pb.RpcServerFactoryPBImpl
> (RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol
> org.apache.hadoop.yarn.server.api.ResourceTrackerPB to the server
>
> 2016-05-23 03:38:21,775 INFO  [IPC Server Responder] ipc.Server
> (Server.java:run(836)) - IPC Server Responder: starting
>
> 2016-05-23 03:38:21,775 INFO  [IPC Server listener on 8031] ipc.Server
> (Server.java:run(676)) - IPC Server listener on 8031: starting
>
> 2016-05-23 03:38:21,786 INFO  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:onEvent(139)) - Launching task:
> nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value:
> "8114e114-db5f-4faa-afb3-ba1ae29e6368-O98094"
>
>
> 2016-05-23 03:38:21,802 INFO  [main] ipc.CallQueueManager
> (CallQueueManager.java:<init>(53)) - Using callQueue class
> java.util.concurrent.LinkedBlockingQueue
>
> 2016-05-23 03:38:21,803 INFO  [pool-2-thread-9]
> handlers.ExecutorLostEventHandler
> (ExecutorLostEventHandler.java:onEvent(39)) - Executor value:
> "myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980948114e114-db5f-4faa-afb3-ba1ae29e6368-S1"
>
>  of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1"
>
>  lost with exit status: -1
>
> 2016-05-23 03:38:21,809 INFO  [Socket Reader #1 for port 8030] ipc.Server
> (Server.java:run(606)) - Starting Socket Reader #1 for port 8030
>
> 2016-05-23 03:38:21,824 INFO  [pool-2-thread-5]
> handlers.StatusUpdateEventHandler
> (StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task:
> nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED
>
> 2016-05-23 03:38:21,852 INFO  [main] pb.RpcServerFactoryPBImpl
> (RpcServerFactoryPBImpl.java:createServer(174)) - Adding protocol
> org.apache.hadoop.yarn.api.ApplicationMasterProtocolPB to the server
>
> 2016-05-23 03:38:21,853 INFO  [IPC Server Responder] ipc.Server
> (Server.java:run(836)) - IPC Server Responder: starting
>
> 2016-05-23 03:38:21,853 INFO  [IPC Server listener on 8030] ipc.Server
> (Server.java:run(676)) - IPC Server listener on 8030: starting
>
> 2016-05-23 03:38:22,504 INFO  [IPC Server listener on 8033] ipc.Server
> (Server.java:run(676)) - IPC Server listener on 8033: starting
>
> 2016-05-23 03:38:22,755 INFO  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:onEvent(100)) - Received offers 1
>
> 2016-05-23 03:38:22,755 WARN  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:matches(198)) - Ignoring unknown resource
> type: dfsio_spindles
>
> 2016-05-23 03:38:22,756 INFO  [pool-2-thread-3]
> scheduler.TaskFactory$NMTaskFactoryImpl
> (TaskFactory.java:getCommandInfo(138)) - Getting Hadoop distribution from:
> http://172.17.0.2:8192/api/config.tgz
>
> 2016-05-23 03:38:22,757 INFO  [pool-2-thread-3]
> handlers.ResourceOffersEventHandler
> (ResourceOffersEventHandler.java:onEvent(139)) - Launching task:
> nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 using offer: value:
> "8114e114-db5f-4faa-afb3-ba1ae29e6368-O98095"
>
>
> 2016-05-23 03:38:22,768 INFO  [pool-2-thread-9]
> handlers.ExecutorLostEventHandler
> (ExecutorLostEventHandler.java:onEvent(39)) - Executor value:
> "myriad_executor8114e114-db5f-4faa-afb3-ba1ae29e6368-00238114e114-db5f-4faa-afb3-ba1ae29e6368-O980958114e114-db5f-4faa-afb3-ba1ae29e6368-S1"
>
>  of slave value: "8114e114-db5f-4faa-afb3-ba1ae29e6368-S1"
>
>  lost with exit status: -1
>
> 2016-05-23 03:38:22,773 INFO  [pool-2-thread-5]
> handlers.StatusUpdateEventHandler
> (StatusUpdateEventHandler.java:onEvent(60)) - Status Update for task:
> nm.medium.ce650d07-e6a6-48bf-aef3-be11c7385bd0 | state: TASK_FAILED
>
> ====================================
>
> To debug above failure, I checked the NM task stdout/stderr but failed to
> get any logs (see screenshot attached)
>
> Then, On attaching to the running containers, found following from the
> docker container:
>
> root@qa101-139:~/myriad/myriad-0.2.0-incubating-rc2/docker# docker exec
> -it 8f730dc2de1a /bin/bash
>
> yarn@qa101-139:/$
>
> yarn@qa101-139:/$ ps -ef
>
> UID        PID  PPID  C STIME TTY          TIME CMD
>
> yarn         1     0  0 03:38 ?        00:00:00 /bin/sh -c
> /usr/local/hadoop/bin/yarn resourcemanager
>
> yarn         7     1 14 03:38 ?        00:00:14 /usr/bin/java
> -Dproc_resourcemanager -Xmx1000m -Dhadoop.log.dir=/usr/local/hadoop/logs
> -Dyarn.log.dir=/usr/l
>
> yarn       296     0  0 03:39 ?        00:00:00 /bin/bash
>
> yarn       302     0  0 03:39 ?        00:00:00 /bin/bash
>
> yarn       309   302  0 03:39 ?        00:00:00 ps -ef
>
> yarn@qa101-139:/$ ls -l /usr/local/hadoop/
>
> total 52
>
> -rw-r--r-- 1 root root 15429 Apr 10  2015 LICENSE.txt
>
> -rw-r--r-- 1 root root   101 Apr 10  2015 NOTICE.txt
>
> -rw-r--r-- 1 root root  1366 Apr 10  2015 README.txt
>
> drwxr-xr-x 2 root root  4096 May 22 19:22 bin
>
> drwxr-xr-x 6 root root  4096 May 22 20:42 etc
>
> drwxr-xr-x 2 root root  4096 May 22 19:22 include
>
> drwxr-xr-x 4 root root  4096 May 22 19:22 lib
>
> drwxr-xr-x 2 root root  4096 May 22 19:22 libexec
>
> drwxr-xr-x 2 root root  4096 May 22 19:22 sbin
>
> drwxr-xr-x 7 root root  4096 May 22 20:42 share
>
> yarn@qa101-139:/$ ls -l /usr/local/hadoop/etc/hadoop/
>
> total 16
>
> -rw-r--r-- 1 root root 1340 May 22 20:02 mapred-site.xml
>
> -rw-r--r-- 1 root root 3395 May 23 03:38 myriad-config-default.yml
>
> -rw-r--r-- 1 root root 4207 May 23 00:27 yarn-site.xml
>
> yarn@qa101-139:/$ cat
> /usr/local/hadoop/etc/hadoop/myriad-config-default.yml
>
> mesosMaster: zk://10.10.101.139:5181/mesos   ->> (Running on the host
> outside of the container)
>
> #Container information for the node managers
>
> containerInfo:
>
>     type: DOCKER
>
>     dockerInfo:
>
>         image: sarjeet/myriad
>
>     volumes:
>
>         -
>
>           containerPath: /tmp
>
>           hostPath: /tmp
>
> checkpoint: false
>
> frameworkFailoverTimeout: 43200000
>
> frameworkName: MyriadAlpha
>
> frameworkRole:
>
> frameworkUser: mapr
>
>                           # running the resource manager.
>
> frameworkSuperUser: root  # To be depricated, currently permissions need
> set by a superuser due to Mesos-1790.  Must be
>
>                           # root or have passwordless sudo. Required if
> nodeManagerURI set, ignored otherwise.
>
> nativeLibrary: /usr/local/lib/libmesos.so
>
> zkServers: 10.10.101.139:5181
>
> zkTimeout: 20000
>
> restApiPort: 8192
>
> profiles:
>
>   zero:  # NMs launched with this profile dynamically obtain cpu/mem from
> Mesos
>
>     cpu: 0
>
>     mem: 0
>
>     spindles: 0
>
>   small:
>
>     cpu: 2
>
>     mem: 2048
>
>     spindles: 1
>
>   medium:
>
>     cpu: 4
>
>     mem: 4096
>
>     spindles: 2
>
>   large:
>
>     cpu: 10
>
>     mem: 12288
>
>     spindles: 4
>
> nmInstances: # NMs to start with. Requires at least 1 NM with a non-zero
> profile.
>
>   medium: 1 # <profile_name : instances>
>
> rebalancer: false
>
> haEnabled: true
>
> servedConfigPath: /dist/config.tgz
>
> nodemanager:
>
>   jvmMaxMemoryMB: 1024
>
>   cpus: 0.2
>
>   cgroups: true
>
> executor:
>
>   jvmMaxMemoryMB: 256
>
>   configUri: http://172.17.0.1:8192/api/config.tgz
>
>   path:
> file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar
>
>   #The following should be used for a remotely distributed URI, hdfs
> assumed but other URI types valid.
>
>   #nodeManagerUri: hdfs://namenode:port/dist/hadoop-2.7.0.tar.gz
>
>   #path:
> file:///opt/mapr/myriad/myriad-0.1/lib/myriad-executor-runnable-0.1.0.jar
>
> yarnEnvironment:
>
>   YARN_NODEMANAGER_OPTS: -Dcluster.name.prefix=/cluster1
> -Dnodemanager.resource.io-spindles=4.0
>
>   YARN_HOME: /opt/mapr/hadoop/hadoop-2.7.0
>
>   HADOOP_CONF_DIR: /mnt/mesos/sandbox/config
>
>   HADOOP_TMP_DIR: /tmp
>
>   HADOOP_LOG_DIR: /mnt/mesos/sandbox
>
>   #JAVA_HOME: /usr/lib/jvm/java-default #System dependent, but sometimes
> necessary
>
> mesosAuthenticationPrincipal:
>
> mesosAuthenticationSecretFilename:
>
> yarn@qa101-139:/$ netstat -anlp | grep 8088
>
> tcp6       0      0 10.10.101.139:8088      :::*
> LISTEN      7/java
>
> yarn@qa101-139:/$ netstat -anlp | grep 8192
>
> tcp6       0      0 :::8192                 :::*
> LISTEN      7/java
>
> yarn@qa101-139:/$
>
> ====================================
>
> Reference: https://github.com/apache/incubator-myriad/tree/master/docker
>
> I might not have looked deeper enough to see if there was any
> configuration issue on launching docker RM, but in case, there is a trivial
> fix or config I missed, I can give this another try. Let me know if there
> was anything I missed?
> - Sarjeet Singh
>

Reply via email to