Weiwei Yang created AMBARI-19289:
------------------------------------

             Summary: HDFS Service check fails if previous active NN is down
                 Key: AMBARI-19289
                 URL: https://issues.apache.org/jira/browse/AMBARI-19289
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.4.2
            Reporter: Weiwei Yang


Reproduce steps
1. Enable namenode HA
2. Shutdown the active namenode, standby takes over
3. Run HDFS service check

hdfs service check script uses

{{hdfs dfsadmin -fs hdfs://mycluster -safemode get | grep OFF}}

to check if namenode is out of safemode. However this command will fail if 1st 
NN is down without checking the state of 2nd NN. This is likely a HDFS bug 
similar to HDFS-8277.

Proposal

There are several approaches to fix this
# Look each namenode address and get safemode with {{hdfs dfsadmin -fs 
hdfs://nn_host:8020 -safemode get | grep OFF}}, as long as there is one NN 
returns OFF, consider DFS is not in safemode and continue the rest of check. 
However is it really necessary to add such complexity for service check?
# Remove the safemode check code, if HDFS is in safemode, read/write operations 
will fail anyway so service check won't pass

I am preferring to #2 because it makes script simpler and work in all cases. 
Note this is service check, it should pass as long as HDFS is in working state. 
It is not namenode check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to