[ https://issues.apache.org/jira/browse/AMBARI-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weiwei Yang updated AMBARI-19289: --------------------------------- Attachment: AMBARI-19289_branch-2.5.01.patch > HDFS Service check fails if previous active NN is down > ------------------------------------------------------ > > Key: AMBARI-19289 > URL: https://issues.apache.org/jira/browse/AMBARI-19289 > Project: Ambari > Issue Type: Bug > Components: ambari-server > Affects Versions: 2.4.2 > Reporter: Weiwei Yang > Assignee: Weiwei Yang > Attachments: AMBARI-19289_branch-2.5.01.patch, > AMBARI-19289_trunk.01.patch, AMBARI-19289_trunk.02.patch > > > *Reproduce steps* > # Enable namenode HA > # Shutdown the active namenode, standby takes over > # Run HDFS service check > hdfs service check script uses > {{hdfs dfsadmin -fs hdfs://mycluster -safemode get | grep OFF}} > to check if namenode is out of safemode. However this command will fail if > 1st NN is down without checking the state of 2nd NN. This is likely a HDFS > bug similar to HDFS-8277. > *Proposal* > There are several approaches to fix this > # Loop each namenode address and get safemode with {{hdfs dfsadmin -fs > hdfs://nn_host:8020 -safemode get | grep OFF}}, as long as there is one NN > returns OFF, consider DFS is not in safemode and continue the rest of check. > However is it really necessary to add such complexity for service check? > # Remove the safemode check code, if HDFS is in safemode, read/write > operations will fail anyway so service check won't pass > I am preferring to #2 because it makes script simpler and work in all cases. > Note this is service check, it should pass as long as HDFS is in working > state. It is not namenode check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)