[
https://issues.apache.org/jira/browse/HDFS-16950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17780001#comment-17780001
]
Wei-Chiu Chuang commented on HDFS-16950:
----------------------------------------
Karthik said because of the missing edit logs it caused data loss. And it's
reproducible.
A workaround would be to enter the NN in safe mode, take checkpoint, before
proceed with the migration.
> Gap in edits after -initializeSharedEdits
> -----------------------------------------
>
> Key: HDFS-16950
> URL: https://issues.apache.org/jira/browse/HDFS-16950
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: journal-node, namenode
> Reporter: Karthik Palanisamy
> Priority: Critical
>
> Namenode failed in the production cluster when JN role is migrated.
> {code:java}
> ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start
> namenode.
> java.io.IOException: There appears to be a gap in the edit log. We expected
> txid xxxxxx, but got txid xxxxxx. {code}
> InitializeSharedEdits issued as part of the role migration step. Note, no
> checkpoint is performed in the past few hours.
> InitializeSharedEdits created a new log segment from the edit_inprogres
> transaction and deleted all old transactions.
> My ask here is to delete any edit transaction older than the fimage
> transaction. But currently, it deletes all transactions and no check is
> enforced in JNStorage#format().
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]