Sarjeet Singh created MYRIAD-128:
------------------------------------

             Summary: Issue with Flex down, Pending NMs stuck in staging and 
don't get to active task.
                 Key: MYRIAD-128
                 URL: https://issues.apache.org/jira/browse/MYRIAD-128
             Project: Myriad
          Issue Type: Bug
          Components: Scheduler
    Affects Versions: Myriad 0.1.0
            Reporter: Sarjeet Singh


Seeing some issue when I tried flexing NMs from Myriad UI. On flexing down 
active NM,  pending NMs doesn't go to active state (not sowing in 'Active 
Tasks') and there is no active NM showing on Myriad UI. Although, there is a NM 
running on the node (verified from jps). 

mapr     20528 20526  1 17:23 ?        00:00:26 
/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.85.x86_64/bin/java -Dproc_nodemanager 
-Xmx1000m -Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs 
-Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log 
-Dyarn.log.file=yarn.log -Dyarn.home.dir= -Dyarn.id.str= 
-Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
-Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native 
-Dyarn.policy.file=hadoop-policy.xml -server 
-Dnodemanager.resource.io-spindles=4.0 
-Dyarn.resourcemanager.hostname=testrm.marathon.mesos 
-Dyarn.nodemanager.container-executor.class=org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor
 -Dnodemanager.resource.cpu-vcores=0 -Dnodemanager.resource.memory-mb=0 
-Dmyriad.yarn.nodemanager.address=0.0.0.0:31000 
-Dmyriad.yarn.nodemanager.localizer.address=0.0.0.0:31001 
-Dmyriad.yarn.nodemanager.webapp.address=0.0.0.0:31002 
-Dmyriad.mapreduce.shuffle.port=0.0.0.0:31003 -Dhadoop.login=maprsasl 
-Dhttps.protocols=TLSv1.2 
-Djava.security.auth.login.config=/opt/mapr/conf/mapr.login.conf 
-Dzookeeper.sasl.clientconfig=Client_simple 
-Dzookeeper.saslprovider=com.mapr.security.simplesasl.SimpleSaslProvider 
-Dhadoop.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs 
-Dyarn.log.dir=/opt/mapr/hadoop/hadoop-2.7.0/logs -Dhadoop.log.file=yarn.log 
-Dyarn.log.file=yarn.log -Dyarn.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 
-Dhadoop.home.dir=/opt/mapr/hadoop/hadoop-2.7.0 
-Dhadoop.root.logger=INFO,console -Dyarn.root.logger=INFO,console 
-Djava.library.path=/opt/mapr/hadoop/hadoop-2.7.0/lib/native -classpath 
/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/common/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/hdfs/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/mapreduce/*:/contrib/capacity-scheduler/*.jar:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/*:/opt/mapr/hadoop/hadoop-2.7.0/share/hadoop/yarn/lib/*:/opt/mapr/hadoop/hadoop-2.7.0/etc/hadoop/nm-config/log4j.properties:/opt/mapr/lib/JPam-1.1.jar
 org.apache.hadoop.yarn.server.nodemanager.NodeManager

>From myriad UI:

Active Tasks
Killable Tasks
Pending Tasks
Staging Tasks
nm.large.123badb1-57d8-4bd2-aa2e-de9fc1898c7f
nm.medium.f2c4126c-4cb2-46af-a1e0-690034b914b8
nm.medium.a9e9fd84-350a-48bc-bcd2-8712ecdc8c66
nm.medium.663f9c6e-f28e-4395-8540-70c306eb04c5
nm.medium.93f7cc91-9263-48a7-821e-3b0ffbe70e66

This is the state even after waited for about 30 min or so after flexing down 
the NM.

I tried this on a single node cluster though, but looks like the problem can 
happen in any case.

I started RM from marathon and was able to get RM & Myriad up & running. With 
RM launched, there is a CGS (medium profile) NM is launched along with it as 
well which is shown as 'Active Task' on Myriad UI. Then, I launched some large 
profile & zero profile NM which are shown now in 'Pending tasks' since there is 
a (CGS default) NM already running on a single node cluster.

Then, I tried flexing down NM from myriad UI, which flexed up the active NM and 
all pending NMs start moving to staging tasks, and then they stuck in staging 
task for longer time. On waited for about > 30min, I dont see any active task 
for NM and all of the pending NM tasks are shown in 'Staging task' only. (See 
the screenshot)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to