[jira] [Commented] (HDFS-2290) Block with corrupt replica is not getting replicated

Konstantin Shvachko (JIRA) Fri, 26 Aug 2011 16:39:53 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092134#comment-13092134
 ]


Konstantin Shvachko commented on HDFS-2290:
-------------------------------------------

Two things.
# Setting replication to 2 for the corrupt file on 3-node cluster in a hope 
that the corrupt replica will be removed and I'll set replication back to 3. 
fsck shows healthy file, but NN does not even try to delete the corrupt 
replica. The DN keeps reporting the corrupt replica, and when I set replication 
back to 3, I end up where I started. 
The general problem seems to be that NN does not schedule deletion of corrupt 
replicas when you lower the replication of the block.
# Starting 4th DN. Replication is triggered as expected, and then removal of 
the corrupt replica is scheduled, but the latter fails with the following 
exception:
{code}
11/08/26 16:19:38 WARN datanode.DataNode: Unexpected error trying to delete 
block blk_-4767793772698703708_1816. BlockInfo not found in volumeMap.
11/08/26 16:19:38 WARN datanode.DataNode: Error processing datanode Command
java.io.IOException: Error in deleting blocks.
        at 
org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1681)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:1021)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:983)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:920)
        at 
org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1439)
        at java.lang.Thread.run(Unknown Source){code}
{code}
I think DN can ignore the absence of metadata file if it is deleting it anyways.

> Block with corrupt replica is not getting replicated
> ----------------------------------------------------
>
>                 Key: HDFS-2290
>                 URL: https://issues.apache.org/jira/browse/HDFS-2290
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Konstantin Shvachko
>             Fix For: 0.22.0
>
>
> A block has one replica marked as corrupt and two good ones. countNodes() 
> correctly detects that there are only 2 live replicas, and fsck reports the 
> block as under-replicated. But ReplicationMonitor never schedules replication 
> of good replicas.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2290) Block with corrupt replica is not getting replicated

Reply via email to