[ 
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14582935#comment-14582935
 ] 

Vinayakumar B commented on HDFS-8277:
-------------------------------------

bq. Agreed, and this is is because the manual safe mode transition is not 
logged in the edit log. That choice could have been intentional but it looks 
like the wrong choice in an HA setup. If it had been logged we would only have 
to contact the active NN. I agree your patch did not introduce this problem 
however I'd like to at least understand the reason for the original choice 
before changing any more behavior.

Hi [~arpitagarwal], Please check 
[this|https://issues.apache.org/jira/browse/HDFS-3507?focusedCommentId=13437134&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13437134]
 comment from [~atm] in HDFS-3507, where the {{isChecked}} parameter was 
introduced, and set to true only for {{dfs#isInSafemode()}}, since this is API 
is cares more about the User calls and required to get the Active NN state.
According to the comment, {{setSafeMode()}} meant to change the safemode state 
on the entire HDFS service, not just one NN. So changes in HDFS-6507 introduced 
these changes to call such APIs on both NNs.

And manually entering safemode, is not permanent action. NN stays in safemode 
until one more manual request is made to come out, or it gets restarted. So 
there is no need of edit logging for this. Right?

And as far as this Jira is concerned, IMO, I think its okay to go ahead with 
available NN for the execution, again since entering safemode is not a 
permanent operation.
But, if the setSafemode() is restricted to only Active NN, then 
saveNamespace(), which is very important regardless of NN state, fails in SNN 
because of not in safemode.

Any thoughts?

> Safemode enter fails when Standby NameNode is down
> --------------------------------------------------
>
>                 Key: HDFS-8277
>                 URL: https://issues.apache.org/jira/browse/HDFS-8277
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, HDFS, namenode
>    Affects Versions: 2.6.0
>         Environment: HDP 2.2.0
>            Reporter: Hari Sekhon
>            Assignee: surendra singh lilhore
>            Priority: Minor
>         Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch, 
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to 
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the 
> standard hdfs client code does, and is instead stopping after getting a 
> connection refused from nn1 which is down. I verified normal hadoop fs writes 
> and reads via cli did work at this time, using nn2. I happened to run this 
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as 
> expected again.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to