Todd Lipcon created HDFS-4015:
---------------------------------
Summary: Safemode should count and report orphaned blocks
Key: HDFS-4015
URL: https://issues.apache.org/jira/browse/HDFS-4015
Project: Hadoop HDFS
Issue Type: Improvement
Components: name-node
Affects Versions: 3.0.0
Reporter: Todd Lipcon
The safemode status currently reports the number of unique reported blocks
compared to the total number of blocks referenced by the namespace. However, it
does not report the inverse: blocks which are reported by datanodes but not
referenced by the namespace.
In the case that an admin accidentally starts up from an old image, this can be
confusing: safemode and fsck will show "corrupt files", which are the files
which actually have been deleted but got resurrected by restarting from the old
image. This will convince them that they can safely force leave safemode and
remove these files -- after all, they know that those files should really have
been deleted. However, they're not aware that leaving safemode will also
unrecoverably delete a bunch of other block files which have been orphaned due
to the namespace rollback.
I'd like to consider reporting something like: "900000 of expected 1000000
blocks have been reported. Additionally, 10000 blocks have been reported which
do not correspond to any file in the namespace. Forcing exit of safemode will
unrecoverably remove those data blocks"
Whether this statistic is also used for some kind of "inverse safe mode" is the
logical next step, but just reporting it as a warning seems easy enough to
accomplish and worth doing.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira