[
https://issues.apache.org/jira/browse/HDFS-6203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13964490#comment-13964490
]
Kihwal Lee commented on HDFS-6203:
----------------------------------
bq. For checking other NN's state, if we add the check into the
transitionToActive method, we cannot still guarantee that the other NN will not
transition to active after the checking. Thus I think the checking here will
not be very useful.
We can make {{transitionToActive}} do the check by default when automatic
fail-over is not used. If the other NN does not respond or in the active state,
the command will fail with warning. At that point the user can reissue it with
a force option, if s/he wants to. I think this is a good preventive measure
for avoiding the easy-to-make but fatal mistake.
> check other namenode's state before HAadmin transitionToActive
> --------------------------------------------------------------
>
> Key: HDFS-6203
> URL: https://issues.apache.org/jira/browse/HDFS-6203
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: ha
> Affects Versions: 2.3.0
> Reporter: patrick white
>
> Current behavior is that the HAadmin -transitionToActive command will
> complete the transition to Active even if the other namenode is already in
> Active state. This is not an allowed condition and should be handled by
> fencing, however setting both namenode's active can happen accidentally with
> relative ease, especially in a production environment when performing manual
> maintenance operations.
> If this situation does occur it is very serious and will likely cause data
> loss, or best case, require a difficult recovery to avoid data loss.
> This is requesting an enhancement to haadmin's -transitionToActive command,
> to have HAadmin check the Active state of the other namenode before
> completing the transition. If the other namenode is Active, then fail the
> request due to other nn already-active.
> Not sure if there is a scenario where both namenode's being Active is valid
> or desired, but to maintain functional compatibility a 'force' parameter
> could be added to override this check and allow previous behavior.
--
This message was sent by Atlassian JIRA
(v6.2#6252)