[
https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222573#comment-15222573
]
Jian Fang commented on HDFS-3743:
---------------------------------
Didn't get a chance to work on this yet and come back again for this issue.
Since HADOOP-7001 is a long way to go, I would start to fix a specific case
first, i.e., QJM is able to format a new journal node after a journal node is
replaced.
My thought is to add some logic to the beginning of the following method in
QuorumJournalManager
Map<AsyncLogger, NewEpochResponseProto> createNewUniqueEpoch()
throws IOException
to check all available journal nodes by calling the following method.
QuorumCall<AsyncLogger, Boolean> call =
loggers.isFormatted();
The call will wait for all journal nodes to response back and timeout after a
given time to avoid waiting forever. If the call times out, simply ignore this
call and continue the workflow in createNewUniqueEpoch(). However, if the call
is successful, will check if any journal node is not formatted. If not
formatted, call format(nsInfo) on this logger to format it. The nsInfo is
available to QJM and I think it should be able to format the new journal node
successfully.
But I have couple questions to ask
1) will this extra step with wait time cause any trouble for this new active
QJM?
2) would this extra step introduce a lot of overhead in normal condition
without a need to format a journal node?
3) since in our cases, we need to restart the name nodes after a new journal
node is in place, the createNewUniqueEpoch() should be called once to format
the new journal node. Is this assumption valid?
4) Once a new journal node is formatted, are there any extra steps to make it
sync data from other peers? Or this has already been handled by the quorum
protocol?
Thanks.
> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
> Key: HDFS-3743
> URL: https://issues.apache.org/jira/browse/HDFS-3743
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer
> takes over, if they don't have any data for that namespace. However, this has
> a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg
> corresponding to another cluster), it will auto-format a directory on those
> nodes. This doesn't cause any data loss, but would be better to bail out with
> an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able
> to re-join the cluster and start storing new segments without having to fail
> over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes
> undone), and the user starts the NN, it should fail to start, because it may
> end up missing edits. If it auto-formats in this case, the user might have
> silent "rollback" of the most recent edits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)