[
https://issues.apache.org/jira/browse/HDFS-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17012512#comment-17012512
]
Tao Yang commented on HDFS-15105:
---------------------------------
Hi, [~xiaochen], I noticed that you have fixed an issue which might get a
similar error in HDFS-12369, could you please take a look at this issue? Hope
to hear your thoughts, Thanks.
> Standby NN exits and fails to restart due to edit log corruption
> ----------------------------------------------------------------
>
> Key: HDFS-15105
> URL: https://issues.apache.org/jira/browse/HDFS-15105
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.8.0
> Reporter: Tao Yang
> Priority: Critical
>
> We found a issue that Standby NN exited and failed to restart until we
> resolved the edit log corruption.
> Error logs:
> {noformat}
> java.io.IOException: Mismatched block IDs or generation stamps, attempting to
> replace block blk_74288647857_73526148211 with blk_74288647857_73526377369 as
> block # 15/17 of
> /maindump/mainv10/dump_online/lasttable/20200105015500/part-319
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308)
> {noformat}
> Related edit log transactions of the same file:
> {noformat}
> 1. TXID=444341628498 time=1578251449632
> OP_UPDATE_BLOCKS
> blocks: ... blk_74288647857_73526148211 blk_74454090866_73526215536
> 2. TXID=444342382774 time=1578251520740
> OP_REASSIGN_LEASE
> 3. TXID=444342401216 time=1578251522779
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369 blk_74454090866_73526374095
> 4. TXID=444342401394
> OP_SET_GENSTAMP_V2
> generate stamp: 73526377369
> 5. TXID=444342401395 time=1578251522835
> OP_TRUNCATE
> 6. TXID=444342402176 time=1578251523246
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369
> {noformat}
> According to the edit logs, it's wield to see that stamp(73526377369) was
> generated in transaction 4 but already used in transaction 3, and for
> transaction 3 there should be only the last block changed but in fact the
> last two blocks are both changed.
> This problem might be produced in a complex scenario that truncate operation
> immediately followed the recover-lease operation for the same file. A
> suspicious point is that between creation and being written for transaction
> 3, stamp of the second last block was updated when committing block
> synchronization caused by the truncate operation.
> Related calling stack is as follows:
> {noformat}
> NameNodeRpcServer#commitBlockSynchronization
> FSNamesystem#commitBlockSynchronization
> // update last block
> if(!copyTruncate) {
> storedBlock.setGenerationStamp(newgenerationstamp); //updated the stamp
> of the second last block in transaction 3 before being written
> storedBlock.setNumBytes(newlength);
> }
> {noformat}
> Any comments are welcome. Thanks.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]