Kevin W Monroe has proposed merging 
lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into 
lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk.

Requested reviews:
  Juju Big Data Development (bigdata-dev)

For more details, see:
https://code.launchpad.net/~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes/+merge/248978

We test if services are running with 'is_jvm_service_active', which uses jps.  
This was running as root on our units, but that doesn't show process names that 
we can parse to see if a service is really active. For example, here's jps 
output running as root on a yarn-hdfs master:

root@juju-canonistack-machine-20:~# jps
27783 Jps
22062 -- process information unavailable
23542 -- process information unavailable

That's less than helpful. So much so, that we get relation failures because the 
charms try to fire up services that are already running. With this MP, we run 
jps as the appropriate user for a given service (usually either hdfs or yarn).

This yields goodness:

ubuntu@juju-canonistack-machine-20:~$ sudo su - hdfs -c jps
22062 NameNode
27825 Jps

ubuntu@juju-canonistack-machine-20:~$ sudo su - yarn -c jps
23542 ResourceManager
27839 Jps

I'm not a big fan of having a dict with hard coded strings, but the alternative 
is to pass a username in with every call to is_jvm_service_active. I'll go that 
route if the herd wants, but this way was less typing for me.
-- 
Your team Juju Big Data Development is requested to review the proposed merge 
of lp:~bigdata-dev/charms/trusty/hdp-hadoop/jpshookfixes into 
lp:~bigdata-dev/charms/trusty/hdp-hadoop/trunk.
=== modified file 'README.md'
--- README.md	2014-12-10 23:31:55 +0000
+++ README.md	2015-02-06 22:41:08 +0000
@@ -57,7 +57,7 @@
 service units as HDFS namenode and the HDFS datanodes also run YARN NodeManager::
     juju deploy hdp-hadoop yarn-hdfs-master
     juju deploy hdp-hadoop compute-node
-    juju add-unit -n 2 yarn-hdfs-master
+    juju add-unit -n 2 compute-node
     juju add-relation yarn-hdfs-master:namenode compute-node:datanode
     juju add-relation yarn-hdfs-master:resourcemanager compute-node:nodemanager
 

=== modified file 'hooks/bdutils.py'
--- hooks/bdutils.py	2014-12-26 13:50:41 +0000
+++ hooks/bdutils.py	2015-02-06 22:41:08 +0000
@@ -128,7 +128,16 @@
                 os.environ[ll[0]] = ll[1].strip().strip(';').strip("\"").strip()
                 
 def is_jvm_service_active(processname):
-    cmd=["jps"]
+    processusers = {
+        "JobHistoryServer": os.environ['YARN_USER'],
+        "ResourceManager": os.environ['YARN_USER'],
+        "NodeManager": os.environ['YARN_USER'],
+        "DataNode": os.environ['HDFS_USER'],
+        "NameNode": os.environ['HDFS_USER'],
+        }
+    # set user based on given process, defaulting to hdfs user
+    username = processusers.get(processname, os.environ['HDFS_USER'])
+    cmd = shlex.split("su {u} -c jps".format(u=username))
     p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
     out, err = p.communicate()
     if err == None and str(out).find(processname) != -1:

=== modified file 'hooks/hdp-hadoop-common.py'
--- hooks/hdp-hadoop-common.py	2014-12-26 13:50:41 +0000
+++ hooks/hdp-hadoop-common.py	2015-02-06 22:41:08 +0000
@@ -434,6 +434,7 @@
 @hooks.hook('resourcemanager-relation-joined')
 def resourcemanager_relation_joined():
     log ("==> resourcemanager-relation-joined","INFO")
+    setHadoopEnvVar()
     if is_jvm_service_active("ResourceManager"):
         relation_set(resourceManagerReady=True)
         relation_set(resourceManager_hostname=get_unit_hostname())
@@ -443,12 +444,12 @@
         sys.exit(0)
     shutil.copy(os.path.join(os.path.sep, os.environ['CHARM_DIR'],\
                              'files', 'scripts', "terasort.sh"), home)
-    setHadoopEnvVar()
     relation_set(resourceManager_ip=unit_get('private-address'))
     relation_set(resourceManager_hostname=get_unit_hostname())
     configureYarn(unit_get('private-address'))
     start_resourcemanager(os.environ["YARN_USER"])
-    start_jobhistory()
+    # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
+    restart_jobhistory()
     open_port(8025)
     open_port(8050)
     open_port(8020)
@@ -475,6 +476,9 @@
     # nodemanager requires data node daemon
     if not is_jvm_service_active("DataNode"):
         start_datanode(os.environ['HDFS_USER'])
+    # TODO: (kwm) start_nm fails if nm is running. is it ok to stop first?
+    if is_jvm_service_active("NodeManager"):
+        stop_nodemanager(os.environ["YARN_USER"])
     start_nodemanager(os.environ["YARN_USER"])
     open_port(8025)
     open_port(8030)
@@ -506,11 +510,11 @@
 def namenode_relation_joined():
     log("Configuring namenode - joined phase", "INFO")
 
+    setHadoopEnvVar()
     if is_jvm_service_active("NameNode"):
         relation_set(nameNodeReady=True)
         relation_set(namenode_hostname=get_unit_hostname())
         return
-    setHadoopEnvVar()
     setDirPermission(os.environ['DFS_NAME_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0755)
     relation_set(namenode_hostname=get_unit_hostname())
     configureHDFS(unit_get('private-address'))
@@ -523,7 +527,8 @@
     HDFS_command("dfs -mkdir -p /user/ubuntu")
     HDFS_command("dfs -chown ubuntu /user/ubuntu")
     HDFS_command("dfs -chmod -R 755 /user/ubuntu")
-    start_jobhistory()
+    # TODO: (kwm) start_jh fails if historyserver is running. is it ok to restart_jh here?
+    restart_jobhistory()
     open_port(8020)
     open_port(8010)
     open_port(50070)
@@ -550,6 +555,9 @@
     fileSetKV(hosts_path, nodename_ip+' ', nodename_hostname)
     configureHDFS(nodename_ip)
     setDirPermission(os.environ['DFS_DATA_DIR'], os.environ['HDFS_USER'], os.environ['HADOOP_GROUP'], 0750)
+    # TODO: (kwm) start_dn fails if dn is running. is it ok to stop first?
+    if is_jvm_service_active("DataNode"):
+        stop_datanode(os.environ["HDFS_USER"])
     start_datanode(os.environ["HDFS_USER"])
     if not is_jvm_service_active("DataNode"):
         log("error ==> DataNode failed to start")

-- 
Mailing list: https://launchpad.net/~bigdata-dev
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~bigdata-dev
More help   : https://help.launchpad.net/ListHelp

Reply via email to