[
https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235906#comment-15235906
]
Jian Fang commented on HDFS-3743:
---------------------------------
I implemented the "-newEditsOnly" option and tested it on a cluster with 3
instances that ran server side daemons such as name nodes, resource managers,
journal nodes, and others. I replaced two out of the three instances
successfully with name node running fine by issuing the command "hdfs namenode
-initializeSharedEdits -newEditsOnly" before started a name node. However, I
ran into an issue when I replaced the last existing instance, i.e., replaced
the original m1,m2, and m3 with m4, m5, and m6 one at a time. The name node
suddenly stuck in loading fsimage.
The error message was as follows.
"
2016-04-04 06:51:34,612 INFO
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode (main): Loading 1
INodes.
2016-04-04 06:51:34,670 INFO
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf (main): Loaded
FSImage in 0 seconds.
2016-04-04 06:51:34,670 INFO org.apache.hadoop.hdfs.server.namenode.FSImage
(main): Loaded image for txid 0 from
/mnt/var/lib/hadoop/dfs-name/current/fsimage_0000000000000000000
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage
(main): Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3af6b48f
expecting start txid #1
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage
(main): Start loading edits file
http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280
2016-04-04 06:51:34,683 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main):
Fast-forwarding stream
'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
to transaction ID 1
2016-04-04 06:51:34,683 INFO
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main):
Fast-forwarding stream
'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
to transaction ID 1
2016-04-04 06:51:34,845 WARN
org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Encountered
exception loading fsimage
java.io.IOException: There appears to be a gap in the edit log. We expected
txid 1, but got txid 19.
at
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
at
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:814)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:798)
at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1557)
"
Seems the fsimage was messed up during the replacement. Could someone shed some
light on what could possibly be wrong?
In my "-newEditsOnly" logic, I skipped the following code snippet in the
initializeSharedEdits() method because the other name node could be still
running and be an active one when the above command was issued. I assumed that
the new journal node could rely on the recovery protocol in QJM to sync up with
other peers for the latest transactions.
----------------------------------------------------------------------------------------------
NNStorage newSharedStorage = sharedEditsImage.getStorage();
// Call Storage.format instead of FSImage.format here, since we don't
// actually want to save a checkpoint - just prime the dirs with
// the existing namespace info
newSharedStorage.format(nsInfo);
sharedEditsImage.getEditLog().formatNonFileJournals(nsInfo);
// Need to make sure the edit log segments are in good shape to
initialize
// the shared edits dir.
fsns.getFSImage().getEditLog().close();
fsns.getFSImage().getEditLog().initJournalsForWrite();
fsns.getFSImage().getEditLog().recoverUnclosedStreams();
copyEditLogSegmentsToSharedDir(fsns, sharedEditsDirs, newSharedStorage,
conf);
--------------------------------------------------------------------------------------------
In the new formatted Journal node, I did observe the segments were recovered
from its peers successfully, for example.
2016-04-11 19:06:38,894 WARN org.apache.hadoop.hdfs.server.common.Storage (IPC
Server handler 0 on 8485): Storage directory
/mnt/var/lib/hadoop/journal/local/data/my-cluster does not exist
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 1 on 8485): Formatting
org.apache.hadoop.hdfs.qjournal.server.Journal@21e17c6d with namespace info:
lv=-63;cid=CID-72db83e1-fe0c-418a-a861-d9525f73adc8;nsid=1427415229;c=0;bpid=BP-1127919701-10.236.146.62-1460399925301
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC
Server handler 1 on 8485): Formatting journal Storage Directory
/mnt/var/lib/hadoop/journal/local/data/my-cluster with nsid: 1427415229
2016-04-11 19:06:39,216 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC
Server handler 1 on 8485): Lock on
/mnt/var/lib/hadoop/journal/local/data/my-cluster/in_use.lock acquired by
nodename [email protected]
2016-04-11 19:06:49,227 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 3 on 8485): Updating lastPromisedEpoch from 0 to 3 for
client /10.139.37.248
2016-04-11 19:06:49,274 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 3 on 8485): Scanning storage
FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,294 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 3 on 8485): No files in
FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,356 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 4 on 8485): Prepared recovery for segment 1667:
lastWriterEpoch: 0
2016-04-11 19:06:49,545 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104
isInProgress: true: no current segment in place
2016-04-11 19:06:49,553 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104
isInProgress: true from
http://ip-10-139-37-248.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=1667&storageInfo=-63%3A1427415229%3A0%3ACID-72db83e1-fe0c-418a-a861-d9525f73adc8
2016-04-11 19:06:50,336 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server handler 0 on
8485): Image Transfer timeout configured to 60000 milliseconds
2016-04-11 19:06:50,616 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server handler 0 on
8485): Transfer took 0.25s at 4079.68 KB/s
2016-04-11 19:06:50,632 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 0 on 8485): Accepted recovery for segment 1667:
segmentState { startTxId: 1667 endTxId: 3104 isInProgress: true }
acceptedInEpoch: 3
2016-04-11 19:06:50,646 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 2 on 8485): Validating log segment
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
about to be finalized
2016-04-11 19:06:50,807 INFO
org.apache.hadoop.hdfs.server.namenode.FileJournalManager (IPC Server handler 2
on 8485): Finalizing edits file
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
->
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_0000000000000001667-0000000000000003104
2016-04-11 19:06:50,845 INFO org.apache.hadoop.hdfs.qjournal.server.Journal
(IPC Server handler 4 on 8485): Updating lastWriterEpoch from 0 to 3 for client
/10.139.37.248
Did I miss anything that could mess up the fsimage files?
Thanks in advance.
> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
> Key: HDFS-3743
> URL: https://issues.apache.org/jira/browse/HDFS-3743
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Affects Versions: QuorumJournalManager (HDFS-3077)
> Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer
> takes over, if they don't have any data for that namespace. However, this has
> a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg
> corresponding to another cluster), it will auto-format a directory on those
> nodes. This doesn't cause any data loss, but would be better to bail out with
> an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able
> to re-join the cluster and start storing new segments without having to fail
> over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes
> undone), and the user starts the NN, it should fail to start, because it may
> end up missing edits. If it auto-formats in this case, the user might have
> silent "rollback" of the most recent edits.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)