[ 
https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13885774#comment-13885774
 ] 

Jordan Mendelson commented on HDFS-782:
---------------------------------------

Could this not be implemented in response to a client reading a remote block? 
The client will already be copying the block across the network in order to 
operate on it. A replication storm shouldn't happen unnecessarily in this case 
since it isn't proactively copying. Since the client is reading the remote 
block, we can be reasonably sure that the block could use an extra replica. 

This could also speed up the case of replicating a recently written block since 
we can reuse the data that has just be copied (even if it is a sub-optimal 
location for the block, it would at least increase data availability until it 
can be replicated properly). Deletion of over-replicated blocks could be happen 
when free space becomes low.

The downside seems to be the potential for extra disk writes. If every remote 
read of a complete block leads to storage of that block on the machine doing 
the read, we could end up writing a lot of data. Though it seems like this 
could be somewhat mitigated with some sort of upper-replica limit.

> dynamic replication
> -------------------
>
>                 Key: HDFS-782
>                 URL: https://issues.apache.org/jira/browse/HDFS-782
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Ning Zhang
>
> In a large and busy cluster, a block can be requested by many clients at the 
> same time. HDFS-767 tries to solve the failing case when the # of retries 
> exceeds the maximum # of retries. However, that patch doesn't solve the 
> performance issue since all failing clients have to wait a certain period 
> before retry, and the # of retries could be high. 
> One solution to solve the performance issue is to increase the # of replicas 
> for this "hot" block dynamically when it is requested many times at a short 
> period. The name node need to be aware such situation and only clean up extra 
> replicas when they are not accessed recently. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to