[ https://issues.apache.org/jira/browse/HDFS-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145941#comment-17145941 ]
Wei-Chiu Chuang commented on HDFS-15105: ---------------------------------------- [~Tao Yang] The truncate feature is seldom used so i am afraid not many people have the experience dealing with it. > Standby NN exits and fails to restart due to edit log corruption > ---------------------------------------------------------------- > > Key: HDFS-15105 > URL: https://issues.apache.org/jira/browse/HDFS-15105 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.8.0 > Reporter: Tao Yang > Priority: Critical > > We found a issue that Standby NN exited and failed to restart until we > resolved the edit log corruption. > Error logs: > {noformat} > java.io.IOException: Mismatched block IDs or generation stamps, attempting to > replace block blk_74288647857_73526148211 with blk_74288647857_73526377369 as > block # 15/17 of > /maindump/mainv10/dump_online/lasttable/20200105015500/part-319 > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308) > {noformat} > Related edit log transactions of the same file: > {noformat} > 1. TXID=444341628498 time=1578251449632 > OP_UPDATE_BLOCKS > blocks: ... blk_74288647857_73526148211 blk_74454090866_73526215536 > 2. TXID=444342382774 time=1578251520740 > OP_REASSIGN_LEASE > 3. TXID=444342401216 time=1578251522779 > OP_CLOSE > blocks: ... blk_74288647857_73526377369 blk_74454090866_73526374095 > 4. TXID=444342401394 > OP_SET_GENSTAMP_V2 > generate stamp: 73526377369 > 5. TXID=444342401395 time=1578251522835 > OP_TRUNCATE > 6. TXID=444342402176 time=1578251523246 > OP_CLOSE > blocks: ... blk_74288647857_73526377369 > {noformat} > According to the edit logs, it's wield to see that stamp(73526377369) was > generated in transaction 4 but already used in transaction 3, and for > transaction 3 there should be only the last block changed but in fact the > last two blocks are both changed. > This problem might be produced in a complex scenario that truncate operation > immediately followed the recover-lease operation for the same file. A > suspicious point is that between creation and being written for transaction > 3, stamp of the second last block was updated when committing block > synchronization caused by the truncate operation. > Related calling stack is as follows: > {noformat} > NameNodeRpcServer#commitBlockSynchronization > FSNamesystem#commitBlockSynchronization > // update last block > if(!copyTruncate) { > storedBlock.setGenerationStamp(newgenerationstamp); //updated the stamp > of the second last block in transaction 3 before being written > storedBlock.setNumBytes(newlength); > } > {noformat} > Any comments are welcome. Thanks. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org