[ 
https://issues.apache.org/jira/browse/HDFS-15105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17145941#comment-17145941
 ] 

Wei-Chiu Chuang commented on HDFS-15105:
----------------------------------------

[~Tao Yang] The truncate feature is seldom used so i am afraid not many people 
have the experience dealing with it.

> Standby NN exits and fails to restart due to edit log corruption
> ----------------------------------------------------------------
>
>                 Key: HDFS-15105
>                 URL: https://issues.apache.org/jira/browse/HDFS-15105
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.8.0
>            Reporter: Tao Yang
>            Priority: Critical
>
> We found a issue that Standby NN exited and failed to restart until we 
> resolved the edit log corruption.
>  Error logs:
> {noformat}
> java.io.IOException: Mismatched block IDs or generation stamps, attempting to 
> replace block blk_74288647857_73526148211 with blk_74288647857_73526377369 as 
> block # 15/17 of 
> /maindump/mainv10/dump_online/lasttable/20200105015500/part-319
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1019)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:431)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:143)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:885)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:866)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:234)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:342)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:295)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:312)
>         at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:455)
>         at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:308)
> {noformat}
> Related edit log transactions of the same file:
> {noformat}
> 1. TXID=444341628498  time=1578251449632
> OP_UPDATE_BLOCKS
> blocks: ... blk_74288647857_73526148211   blk_74454090866_73526215536
> 2. TXID=444342382774   time=1578251520740
> OP_REASSIGN_LEASE
> 3. TXID=444342401216  time=1578251522779
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369   blk_74454090866_73526374095
> 4. TXID=444342401394
> OP_SET_GENSTAMP_V2 
> generate stamp: 73526377369
> 5. TXID=444342401395  time=1578251522835
> OP_TRUNCATE
> 6. TXID=444342402176  time=1578251523246
> OP_CLOSE
> blocks: ... blk_74288647857_73526377369 
> {noformat}
> According to the edit logs, it's wield to see that stamp(73526377369) was 
> generated in transaction 4 but already used in transaction 3, and for 
> transaction 3 there should be only the last block changed but in fact the 
> last two blocks are both changed.
> This problem might be produced in a complex scenario that truncate operation 
> immediately followed the recover-lease operation for the same file. A 
> suspicious point is that between creation and being written for transaction 
> 3, stamp of the second last block was updated when committing block 
> synchronization caused by the truncate operation.
> Related calling stack is as follows: 
> {noformat}
> NameNodeRpcServer#commitBlockSynchronization
>   FSNamesystem#commitBlockSynchronization
>     // update last block
>     if(!copyTruncate) {
>       storedBlock.setGenerationStamp(newgenerationstamp); //updated the stamp 
> of the second last block in transaction 3 before being written
>       storedBlock.setNumBytes(newlength);
>     }
> {noformat}
> Any comments are welcome. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to