[
https://issues.apache.org/jira/browse/HDFS-13999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16711732#comment-16711732
]
Erik Krogen commented on HDFS-13999:
------------------------------------
Thanks for fixing this [~jojochuang]! This has plagued us for a long time. It's
really great to hear that Dynamometer was useful for this!
> Bogus missing block warning if the file is under construction when NN starts
> ----------------------------------------------------------------------------
>
> Key: HDFS-13999
> URL: https://issues.apache.org/jira/browse/HDFS-13999
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.6.0
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
> Fix For: 2.7.8
>
> Attachments: HDFS-13999.branch-2.7.001.patch, webui missing blocks.png
>
>
> We found an interesting case where web UI displays a few missing blocks, but
> it doesn't state which files are corrupt. What'll also happen is that fsck
> states the file system is healthy. This bug is similar to HDFS-10827 and
> HDFS-8533.
> (See the attachment for an example)
> Using Dynamometer, I was able to reproduce the bug, and realized the the
> "missing" blocks are actually healthy, but somehow neededReplications doesn't
> get updated when NN receives block reports. What's more interesting is that
> the files associated with the "missing" blocks are under construction when NN
> starts, and so after a while NN prints file recovery log.
> Given that, I determined the following code is the source of bug:
> {code:java|title=BlockManager#addStoredBlock}
> ....
> // if file is under construction, then done for now
> if (bc.isUnderConstruction()) {
> return storedBlock;
> }
> {code}
> which is wrong, because a file may have multiple blocks, and the first block
> is complete. In which case, the neededReplications structure doesn't get
> updated for the first block, and thus the missing block warning on the web
> UI. More appropriately, it should check the state of the block itself, not
> the file.
> Fortunately, it was unintentionally fixed via HDFS-9754:
> {code:java}
> // if block is still under construction, then done for now
> if (!storedBlock.isCompleteOrCommitted()) {
> return storedBlock;
> }
> {code}
> We should bring this fix into branch-2.7 too. That said, this is a harmless
> warning, and should go away after the under-construction-files are recovered,
> and the NN restarts (or force full block reports).
> Kudos to Dynamometer! It would be impossible to reproduce this bug without
> the tool. And thanks [~smeng] for helping with the reproduction.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]