[
https://issues.apache.org/jira/browse/HDFS-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092134#comment-13092134
]
Konstantin Shvachko commented on HDFS-2290:
-------------------------------------------
Two things.
# Setting replication to 2 for the corrupt file on 3-node cluster in a hope
that the corrupt replica will be removed and I'll set replication back to 3.
fsck shows healthy file, but NN does not even try to delete the corrupt
replica. The DN keeps reporting the corrupt replica, and when I set replication
back to 3, I end up where I started.
The general problem seems to be that NN does not schedule deletion of corrupt
replicas when you lower the replication of the block.
# Starting 4th DN. Replication is triggered as expected, and then removal of
the corrupt replica is scheduled, but the latter fails with the following
exception:
{code}
11/08/26 16:19:38 WARN datanode.DataNode: Unexpected error trying to delete
block blk_-4767793772698703708_1816. BlockInfo not found in volumeMap.
11/08/26 16:19:38 WARN datanode.DataNode: Error processing datanode Command
java.io.IOException: Error in deleting blocks.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1681)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1021)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:983)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:920)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1439)
at java.lang.Thread.run(Unknown Source){code}
{code}
I think DN can ignore the absence of metadata file if it is deleting it anyways.
> Block with corrupt replica is not getting replicated
> ----------------------------------------------------
>
> Key: HDFS-2290
> URL: https://issues.apache.org/jira/browse/HDFS-2290
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: name-node
> Affects Versions: 0.22.0
> Reporter: Konstantin Shvachko
> Fix For: 0.22.0
>
>
> A block has one replica marked as corrupt and two good ones. countNodes()
> correctly detects that there are only 2 live replicas, and fsck reports the
> block as under-replicated. But ReplicationMonitor never schedules replication
> of good replicas.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira