[ https://issues.apache.org/jira/browse/HDFS-3605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon updated HDFS-3605: ------------------------------ Attachment: hdfs-3605.txt Hey Uma. I took your unit test (thanks) and modified it to be minimal and remove sleeps. Then I prepared a patch with a slightly different approach: I now use a boolean inside BlockManager to determine whether to do the block postponement. I think this is a bit simpler, and still fixes the issue. Am I missing another case with this fix? The optimization you did might be useful but per above I think we can make this minimal and optimize separately. I don't think it's required for the bugfix. This patch isn't quite final - I want to add a few javadocs, etc. > Block mistakenly marked corrupt during edit log catchup phase of failover > ------------------------------------------------------------------------- > > Key: HDFS-3605 > URL: https://issues.apache.org/jira/browse/HDFS-3605 > Project: Hadoop HDFS > Issue Type: Bug > Components: ha, name-node > Affects Versions: 2.0.0-alpha, 2.0.1-alpha > Reporter: Brahma Reddy Battula > Assignee: Todd Lipcon > Attachments: HDFS-3605.patch, TestAppendBlockMiss.java, hdfs-3605.txt > > > Open file for append > Write data and sync. > After next log roll and editlog tailing in standbyNN close the append stream. > Call append multiple times on the same file, before next editlog roll. > Now abruptly kill the current active namenode. > Here block is missed.. > this may be because of All latest blocks were queued in StandBy Namenode. > During failover, first OP_CLOSE was processing the pending queue and adding > the block to corrupted block. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira