[ 
https://issues.apache.org/jira/browse/HDFS-3743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235906#comment-15235906
 ] 

Jian Fang commented on HDFS-3743:
---------------------------------

I implemented the "-newEditsOnly" option and tested it on a cluster with 3 
instances that ran server side daemons such as name nodes, resource managers, 
journal nodes, and others. I replaced two out of the three instances 
successfully with name node running fine by issuing the command "hdfs namenode 
-initializeSharedEdits -newEditsOnly" before started a name node. However, I 
ran into an issue when I replaced the last existing instance, i.e., replaced 
the original m1,m2, and m3 with m4, m5, and m6 one at a time. The name node 
suddenly stuck in loading fsimage.

The error message was as follows.

"
2016-04-04 06:51:34,612 INFO 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatPBINode (main): Loading 1 
INodes.
2016-04-04 06:51:34,670 INFO 
org.apache.hadoop.hdfs.server.namenode.FSImageFormatProtobuf (main): Loaded 
FSImage in 0 seconds.
2016-04-04 06:51:34,670 INFO org.apache.hadoop.hdfs.server.namenode.FSImage 
(main): Loaded image for txid 0 from 
/mnt/var/lib/hadoop/dfs-name/current/fsimage_0000000000000000000
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage 
(main): Reading 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3af6b48f 
expecting start txid #1
2016-04-04 06:51:34,678 INFO org.apache.hadoop.hdfs.server.namenode.FSImage 
(main): Start loading edits file 
http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280
2016-04-04 06:51:34,683 INFO 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main): 
Fast-forwarding stream 
'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
 to transaction ID 1
2016-04-04 06:51:34,683 INFO 
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream (main): 
Fast-forwarding stream 
'http://ip-10-171-94-13.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=19&storageInfo=-63%3A1708928638%3A0%3ACID-218c0d5a-8772-4a75-8ede-44f84d24f280'
 to transaction ID 1
2016-04-04 06:51:34,845 WARN 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem (main): Encountered 
exception loading fsimage
java.io.IOException: There appears to be a gap in the edit log. We expected 
txid 1, but got txid 19.
at 
org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:215)
at 
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:837)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadFSImage(FSImage.java:692)
at 
org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:294)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:975)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:681)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:585)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:645)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:814)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:798)
at 
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1491)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1557)
"

Seems the fsimage was messed up during the replacement. Could someone shed some 
light on what could possibly be wrong? 

In my "-newEditsOnly" logic, I skipped the following code snippet in the 
initializeSharedEdits() method because the other name node could be still 
running and be an active one when the above command was issued. I assumed that 
the new journal node could rely on the recovery protocol in QJM to sync up with 
other peers for the latest transactions.

----------------------------------------------------------------------------------------------
        NNStorage newSharedStorage = sharedEditsImage.getStorage();

        // Call Storage.format instead of FSImage.format here, since we don't
        // actually want to save a checkpoint - just prime the dirs with
        // the existing namespace info
        newSharedStorage.format(nsInfo);
        sharedEditsImage.getEditLog().formatNonFileJournals(nsInfo);

        // Need to make sure the edit log segments are in good shape to 
initialize
        // the shared edits dir.
        fsns.getFSImage().getEditLog().close();
        fsns.getFSImage().getEditLog().initJournalsForWrite();
        fsns.getFSImage().getEditLog().recoverUnclosedStreams();

        copyEditLogSegmentsToSharedDir(fsns, sharedEditsDirs, newSharedStorage,
            conf);
--------------------------------------------------------------------------------------------
In the new formatted Journal node, I did observe the segments were recovered 
from its peers successfully, for example.

2016-04-11 19:06:38,894 WARN org.apache.hadoop.hdfs.server.common.Storage (IPC 
Server handler 0 on 8485): Storage directory 
/mnt/var/lib/hadoop/journal/local/data/my-cluster does not exist
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 1 on 8485): Formatting 
org.apache.hadoop.hdfs.qjournal.server.Journal@21e17c6d with namespace info: 
lv=-63;cid=CID-72db83e1-fe0c-418a-a861-d9525f73adc8;nsid=1427415229;c=0;bpid=BP-1127919701-10.236.146.62-1460399925301
2016-04-11 19:06:39,170 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC 
Server handler 1 on 8485): Formatting journal Storage Directory 
/mnt/var/lib/hadoop/journal/local/data/my-cluster with nsid: 1427415229
2016-04-11 19:06:39,216 INFO org.apache.hadoop.hdfs.server.common.Storage (IPC 
Server handler 1 on 8485): Lock on 
/mnt/var/lib/hadoop/journal/local/data/my-cluster/in_use.lock acquired by 
nodename [email protected]
2016-04-11 19:06:49,227 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 3 on 8485): Updating lastPromisedEpoch from 0 to 3 for 
client /10.139.37.248
2016-04-11 19:06:49,274 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 3 on 8485): Scanning storage 
FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,294 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 3 on 8485): No files in 
FileJournalManager(root=/mnt/var/lib/hadoop/journal/local/data/my-cluster)
2016-04-11 19:06:49,356 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 4 on 8485): Prepared recovery for segment 1667: 
lastWriterEpoch: 0
2016-04-11 19:06:49,545 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104 
isInProgress: true: no current segment in place
2016-04-11 19:06:49,553 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 0 on 8485): Synchronizing log startTxId: 1667 endTxId: 3104 
isInProgress: true from 
http://ip-10-139-37-248.ec2.internal:8480/getJournal?jid=my-cluster&segmentTxId=1667&storageInfo=-63%3A1427415229%3A0%3ACID-72db83e1-fe0c-418a-a861-d9525f73adc8
2016-04-11 19:06:50,336 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server handler 0 on 
8485): Image Transfer timeout configured to 60000 milliseconds
2016-04-11 19:06:50,616 INFO 
org.apache.hadoop.hdfs.server.namenode.TransferFsImage (IPC Server handler 0 on 
8485): Transfer took 0.25s at 4079.68 KB/s
2016-04-11 19:06:50,632 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 0 on 8485): Accepted recovery for segment 1667: 
segmentState { startTxId: 1667 endTxId: 3104 isInProgress: true } 
acceptedInEpoch: 3
2016-04-11 19:06:50,646 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 2 on 8485): Validating log segment 
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
 about to be finalized
2016-04-11 19:06:50,807 INFO 
org.apache.hadoop.hdfs.server.namenode.FileJournalManager (IPC Server handler 2 
on 8485): Finalizing edits file 
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_inprogress_0000000000000001667
 -> 
/mnt/var/lib/hadoop/journal/local/data/my-cluster/current/edits_0000000000000001667-0000000000000003104
2016-04-11 19:06:50,845 INFO org.apache.hadoop.hdfs.qjournal.server.Journal 
(IPC Server handler 4 on 8485): Updating lastWriterEpoch from 0 to 3 for client 
/10.139.37.248

Did I miss anything that could mess up the fsimage files?

Thanks in advance.

> QJM: improve formatting behavior for JNs
> ----------------------------------------
>
>                 Key: HDFS-3743
>                 URL: https://issues.apache.org/jira/browse/HDFS-3743
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>    Affects Versions: QuorumJournalManager (HDFS-3077)
>            Reporter: Todd Lipcon
>
> Currently, the JournalNodes automatically format themselves when a new writer 
> takes over, if they don't have any data for that namespace. However, this has 
> a few problems:
> 1) if the administrator accidentally points a new NN at the wrong quorum (eg 
> corresponding to another cluster), it will auto-format a directory on those 
> nodes. This doesn't cause any data loss, but would be better to bail out with 
> an error indicating that they need to be formatted.
> 2) if a journal node crashes and needs to be reformatted, it should be able 
> to re-join the cluster and start storing new segments without having to fail 
> over to a new NN.
> 3) if 2/3 JNs get accidentally reformatted (eg the mount point becomes 
> undone), and the user starts the NN, it should fail to start, because it may 
> end up missing edits. If it auto-formats in this case, the user might have 
> silent "rollback" of the most recent edits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to