[
https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15034287#comment-15034287
]
Kihwal Lee commented on HDFS-4015:
----------------------------------
[~anu],I have seen intermittent test failures in precommit builds. Do you think
it is a test issue?
{noformat}
java.lang.AssertionError: expected:<18> but was:<0>
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:555)
at org.junit.Assert.assertEquals(Assert.java:542)
at
org.apache.hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency.testGenerationStampInFuture(TestNameNodeMetadataConsistency.java:125)
{noformat}
> Safemode should count and report orphaned blocks
> ------------------------------------------------
>
> Key: HDFS-4015
> URL: https://issues.apache.org/jira/browse/HDFS-4015
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 3.0.0
> Reporter: Todd Lipcon
> Assignee: Anu Engineer
> Fix For: 2.8.0
>
> Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch,
> HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch,
> HDFS-4015.006.patch, HDFS-4015.007.patch
>
>
> The safemode status currently reports the number of unique reported blocks
> compared to the total number of blocks referenced by the namespace. However,
> it does not report the inverse: blocks which are reported by datanodes but
> not referenced by the namespace.
> In the case that an admin accidentally starts up from an old image, this can
> be confusing: safemode and fsck will show "corrupt files", which are the
> files which actually have been deleted but got resurrected by restarting from
> the old image. This will convince them that they can safely force leave
> safemode and remove these files -- after all, they know that those files
> should really have been deleted. However, they're not aware that leaving
> safemode will also unrecoverably delete a bunch of other block files which
> have been orphaned due to the namespace rollback.
> I'd like to consider reporting something like: "900000 of expected 1000000
> blocks have been reported. Additionally, 10000 blocks have been reported
> which do not correspond to any file in the namespace. Forcing exit of
> safemode will unrecoverably remove those data blocks"
> Whether this statistic is also used for some kind of "inverse safe mode" is
> the logical next step, but just reporting it as a warning seems easy enough
> to accomplish and worth doing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)