[
https://issues.apache.org/jira/browse/HDFS-15114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17042956#comment-17042956
]
Steven Rand commented on HDFS-15114:
------------------------------------
Hi [~ayushtkn],
Unfortunately, I don't know how to reproduce what happened. We've performed
rolling restarts of the JournalNodes plenty of times without this happening.
I know that makes this JIRA very hard or impossible for anyone to take action
on, but I thought it was good to have the report anyway in case other people
see similar issues.
> JournalNodes' committed-txid file includes aborted transaction, breaks
> NameNode startup
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-15114
> URL: https://issues.apache.org/jira/browse/HDFS-15114
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: journal-node, namenode
> Affects Versions: 3.2.1
> Reporter: Steven Rand
> Priority: Major
>
> A couple of days ago, our active NameNode in an HA setup aborted a
> {{QuorumOutputStream}} starting at tx 3389424 because tx 3389425 failed to be
> written. This was likely related to a rolling restart of the three
> JournalNodes that was happening at this time. The NameNode logged:
> {code:java}
> 2020-01-11 02:00:50,229 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from
> 10.6.1.181
> 2020-01-11 02:00:50,229 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
> 2020-01-11 02:00:50,229 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 3389424,
> 3389424
> 2020-01-11 02:00:50,229 INFO
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2
> Total time for transactions(ms): 1 Number of transactions batched in Syncs:
> 0 Number of syncs: 1 SyncTimes(ms): 1 7
> 2020-01-11 02:00:50,245 WARN
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal
> 10.6.2.187:8485 failed to write txns 3389425-3389425. Will try to write to
> this JN again after the next log roll.
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.qjournal.protocol.JournalOutOfSyncException):
> Can't write, no segment open ; journal id: <journal_name>
> at org.apache.hadoop.hdfs.qjournal.server.Journal.checkSync(Journal.java:545)
> ... rest of stacktrace ...
> // the same warning for the second JournalNode
> // the same warning for the third JournalNode
> 2020-01-11 02:00:50,246 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for
> required journal (JournalAndStream(mgr=QJM to [10.6.1.4:8485,
> 10.6.1.181:8485, 10.6.2.187:8485], stream=QuorumOutputStream starting at txid
> 3389424))
> org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
> exceptions to achieve quorum size 2/3. 3 exceptions thrown:
> // the same "Can't write, no segment open ; journal id: <journal_name>" error
> for all 3 JournalNodes
> 2020-01-11 02:00:50,246 WARN
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting
> QuorumOutputStream starting at txid 3389424
> 2020-01-11 02:00:50,255 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM
> to [10.6.1.4:8485, 10.6.1.181:8485, 10.6.2.187:8485],
> stream=QuorumOutputStream starting at txid 3389424))
> {code}
> Even though the stream was aborted, the {{committed-txid}} file on each of
> the three JournalNodes was updated to be {{3389424}}.
> This caused both NameNodes to fail to start with this error:
>
> {code:java}
> 2020-01-11 02:54:35,483 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services
> required for active state
> 2020-01-11 02:54:35,491 INFO
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Starting
> recovery process for unclosed journal segments...
> 2020-01-11 02:54:35,537 INFO
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Successfully
> started new epoch 80
> 2020-01-11 02:54:35,537 INFO
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Beginning
> recovery of unclosed segment starting at txid 3389422
> 2020-01-11 02:54:35,574 INFO
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Recovery prepare
> phase complete. Responses:
> 10.6.1.4:8485: segmentState { startTxId: 3389422 endTxId: 3389423
> isInProgress: false } lastWriterEpoch: 57 lastCommittedTxId: 3389424
> 10.6.2.187:8485: segmentState { startTxId: 3389422 endTxId: 3389423
> isInProgress: false } lastWriterEpoch: 57 lastCommittedTxId: 3389424
> 2020-01-11 02:54:35,575 INFO
> org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Using longest
> log: 10.6.1.4:8485=segmentState {
> startTxId: 3389422
> endTxId: 3389423
> isInProgress: false
> }
> lastWriterEpoch: 57
> lastCommittedTxId: 3389424
> 2020-01-11 02:54:35,575 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error:
> recoverUnfinalizedSegments failed for required journal
> (JournalAndStream(mgr=QJM to [10.6.1.4:8485, 10.6.1.181:8485,
> 10.6.2.187:8485], stream=null))
> java.lang.AssertionError: Decided to synchronize log to startTxId: 3389422
> endTxId: 3389423
> isInProgress: false
> but logger 10.6.1.4:8485 had seen txid 3389424 committed
> ... rest of stacktrace ...
> 2020-01-11 02:54:35,577 INFO org.apache.hadoop.util.ExitUtil: Exiting with
> status 1: Error: recoverUnfinalizedSegments failed for required journal
> (JournalAndStream(mgr=QJM to [10.6.1.4:8485, 10.6.1.181:8485,
> 10.6.2.187:8485], stream=null))
> {code}
> A potentially relevant detail is that each of the three JournalNodes logged
> this at around the time of the aborted stream:
> {code:java}
> 2020-01-11 02:00:32,649 INFO org.apache.hadoop.hdfs.qjournal.server.Journal:
> Latest log is
> EditLogFile(file=<path_to_edits>/<journal_name>/current/edits_inprogress_0000000000003389424,first=0000000000003389424,last=0000000000003389424,inProgress=true,hasCorruptHeader=false)
> ; journal id: <journal_name>
> {code}
> That file was later renamed to
> {{edits_inprogress_0000000000003389424.stale}}, and the JournalNodes no
> longer consider it to be the most recent edit file. Instead, they now log
> that {{edits_0000000000003389422-0000000000003389423}} is the most recent
> file they know about.
> So what seems to have happened is:
> 1. The active NameNode tries to write transactions 3389424 and 3389425 to the
> journal.
> 2. Tx 3389424 is written successfully, and the {{committed-txid}} file is
> updated to reflect this.
> 3. Tx 3389425 fails to be written, so the NameNode aborts the
> {{QuorumOutputStream}} that contained 3389424 and 3389425.
> 4. Because the stream was aborted, the JournalNodes rename
> {{edits_inprogress_0000000000003389424}} to
> {{edits_inprogress_0000000000003389424.stale}} and forget about tx 3389424.
> 5. The NameNodes fail to start because the latest committed tx according to
> the JournalNodes (which comes from the {{committed-txid}} file) is in the
> future, which (understandably) is an illegal state.
> I believe that the NameNodes and the JournalNodes are in a consistent state,
> and that there's no corruption or data loss or anything like that. I think
> the problem is just that the {{committed-txid}} file was updated prematurely,
> and contains an incorrect value.
> For this particular cluster, I ran {{./service/bin/hdfs namenode
> -initializeSharedEdits}} to format the JournalNodes, but I backed up the
> edits directory first in case there's more information in there that's
> helpful for debugging.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]