[ 
https://issues.apache.org/jira/browse/HDFS-7082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14138555#comment-14138555
 ] 

Srikanth Upputuri commented on HDFS-7082:
-----------------------------------------

Currently if the below condition in BlockManager#markBlockAsCorrupt is true we 
go ahead and invalidate the corrupt replica. But for the scenario in question, 
it will be false.
{code}
    boolean hasMoreCorruptReplicas = minReplicationSatisfied &&
        (numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) >
        bc.getBlockReplication();
{code}
I propose to change this to 
{code}
    boolean hasMoreCorruptReplicas = minReplicationSatisfied &&
        (numberOfReplicas.liveReplicas() + numberOfReplicas.corruptReplicas()) 
>=
        bc.getBlockReplication();
{code}
This solves the current problem as well as retains almost all the existing 
behavior.
Now we let the 'total replicas' become 'replication factor - 1'. And we don't 
let it go down beyond that. This will effectively vacate a slot on exactly one 
datanode and let the replication happen, thereby solving the reported problem. 

Example scenarios: 

1. DN1, DN2, DN3, replication factor =3, DN3 replica is corrupt. 
The corrupt replica is invalidated and deleted. New live replica will be 
written to DN3.

2. DN1, DN2, DN3, replication factor =3, DN2 and DN3 replicas are corrupt. 
DN3 sends block report. The corrupt replica on DN3 is invalidated and deleted. 
DN2 sends block report. The corrupt replica on DN2 will not be invalidated as 
the current 'total replicas' < 'replication factor'. New live replica will 
eventually be written to DN3. Then on further block report from DN2, the 
corrupt replica will get deleted.


> When replication factor equals number of data nodes, corrupt replica will 
> never get substituted with good replica
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-7082
>                 URL: https://issues.apache.org/jira/browse/HDFS-7082
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Srikanth Upputuri
>            Assignee: Srikanth Upputuri
>            Priority: Minor
>
> BlockManager will not invalidate a corrupt replica if this brings down the 
> total number of replicas below replication factor (except if the corrupt 
> replica has a wrong genstamp). On clusters where the replication factor = 
> total data nodes, a new replica can not be created from a live replica as all 
> the available datanodes already have a replica each. Because of this, the 
> corrupt replicas will never be substituted with good replicas, so will never 
> get deleted. Sooner or later all replicas may get corrupt and there will be 
> no live replicas in the cluster for this block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to