[
https://issues.apache.org/jira/browse/HDFS-8277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14597130#comment-14597130
]
Vinayakumar B commented on HDFS-8277:
-------------------------------------
bq. If we cannot make the change for 2.x I prefer not changing the current
behavior of failing 'safemode enter' when SBN is down.
In case, where SNN is down, may be for maintenance, but available in
configuration, going ahead to next namenode on connectexception seems
reasonable.
To avoid unexpected behavior, may be can add active/standby check for the next
namenode before changing the safemode status and can change only if next
namenode is active.?
Though this is kind of workaround instead of breaking compatibility, IMO
proposal as in v1 patch seems reasonable.
Any thoughts?
> Safemode enter fails when Standby NameNode is down
> --------------------------------------------------
>
> Key: HDFS-8277
> URL: https://issues.apache.org/jira/browse/HDFS-8277
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: ha, HDFS, namenode
> Affects Versions: 2.6.0
> Environment: HDP 2.2.0
> Reporter: Hari Sekhon
> Assignee: Surendra Singh Lilhore
> Priority: Minor
> Attachments: HDFS-8277-safemode-edits.patch, HDFS-8277.patch,
> HDFS-8277_1.patch, HDFS-8277_2.patch, HDFS-8277_3.patch, HDFS-8277_4.patch
>
>
> HDFS fails to enter safemode when the Standby NameNode is down (eg. due to
> AMBARI-10536).
> {code}hdfs dfsadmin -safemode enter
> safemode: Call From nn2/x.x.x.x to nn1:8020 failed on connection exception:
> java.net.ConnectException: Connection refused; For more details see:
> http://wiki.apache.org/hadoop/ConnectionRefused{code}
> This appears to be a bug in that it's not trying both NameNodes like the
> standard hdfs client code does, and is instead stopping after getting a
> connection refused from nn1 which is down. I verified normal hadoop fs writes
> and reads via cli did work at this time, using nn2. I happened to run this
> command as the hdfs user on nn2 which was the surviving Active NameNode.
> After I re-bootstrapped the Standby NN to fix it the command worked as
> expected again.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)