[
https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646422#comment-14646422
]
Jian Fang commented on HDFS-3743:
---------------------------------
I meant we cannot run "initializeSharedEdits" command to format a new
replacement JN (or any JNs at all) when the name node was running because the
directory was locked and we saw the following exception:
ERROR namenode.NameNode: Could not initialize shared edits dir
java.io.IOException: Cannot lock storage /var/lib/hadoop/dfs-name. The
directory is already locked.
As a result, it should be the QJM's responsibility to detect the changes from
configuration by using HADOOP-7001 at run time and format the new JNs properly.
If this really works, perhaps you don't need rolling restart of JNs any more if
they don't need to communicate with each other to make decisions like zookeeper
instances. If I understand correctly, the Quorum Journal protocol only
implemented the log replication part of Paxos, right?
> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
> Key: HDFS-3743
> URL: https://issues.apache.org/jira/browse/HDFS-3743
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer
> takes over, if they don't have any data for that namespace. However, this has
> a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg
> corresponding to another cluster), it will auto-format a directory on those
> nodes. This doesn't cause any data loss, but would be better to bail out with
> an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able
> to re-join the cluster and start storing new segments without having to fail
> over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes
> undone), and the user starts the NN, it should fail to start, because it may
> end up missing edits. If it auto-formats in this case, the user might have
> silent "rollback" of the most recent edits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)