[
https://issues.apache.org/jira/browse/HADOOP-4103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666819#action_12666819
]
Raghu Angadi commented on HADOOP-4103:
--------------------------------------
The scope of the fix is narrowed to the following :
# NameNode webui shows in (probably in red) indicating if there are any missing
blocks.
#will mostly add simon stats for such a number.
# 'dfsadmin -metasave' can be used to find all the missing blocks
## later jira will enhance -metasave or have different command that is
more user friendly. currently -metasave is mainly meant for developers.
For this to be a straight forward fix, I need to make one policy change:
currently if a block does not have any good replicas left it is not included in
"neededReplications" list. I think this was done mainly as an "optimization".
But a cluster should not have any blocks this state. even 'neededReplications'
name implies such blocks should be included. It would be better if I don't need
to add another list that need to be maintained.
> Alert for missing blocks
> ------------------------
>
> Key: HADOOP-4103
> URL: https://issues.apache.org/jira/browse/HADOOP-4103
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Affects Versions: 0.17.2
> Reporter: Christian Kunz
> Assignee: Raghu Angadi
>
> A whole bunch of datanodes became dead because of some network problems
> resulting in heartbeat timeouts although datanodes were fine.
> Many processes started to fail because of the corrupted filesystem.
> In order to catch and diagnose such problems faster the namenode should
> detect the corruption automatically and provide a way to alert operations. At
> the minimum it should show the fact of corruption on the GUI.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.