[
https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780872#action_12780872
]
Ning Zhang commented on HDFS-782:
---------------------------------
To elaborate on the proposal, a data node keeps the statistics on how many
clients are requesting a certain block. If the number exceeds a certain
threshold, the data node can send the block to a number of data nodes
(children) and ask them to replicate the block (one heuristics is to choose
from the data nodes whose asked for the block). If a child data node accepts
the replication request (e.g., it doesn't hold already), it goes through the
same protocol as adding a new replica acknowledged by the name node. The reason
we propose datanode->datanode replication rather than
datanode->namenode->datanode replication is that it is much faster for the
former case than the latter (whose performance depending on the work load of
the name node could be minutes). If the children also got too many requests,
they can proactively replicate themselves recursively, until the # of requests
are distributed to sufficient number of replicas.
Currently the name node cleans up the extra replicas periodically. To address
DN->DN dynamic replication, we need to add a heuristic to let it clean extra
replicas only when they has not been access in a certain period.
Any suggests?
> dynamic replication
> -------------------
>
> Key: HDFS-782
> URL: https://issues.apache.org/jira/browse/HDFS-782
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Ning Zhang
>
> In a large and busy cluster, a block can be requested by many clients at the
> same time. HDFS-767 tries to solve the failing case when the # of retries
> exceeds the maximum # of retries. However, that patch doesn't solve the
> performance issue since all failing clients have to wait a certain period
> before retry, and the # of retries could be high.
> One solution to solve the performance issue is to increase the # of replicas
> for this "hot" block dynamically when it is requested many times at a short
> period. The name node need to be aware such situation and only clean up extra
> replicas when they are not accessed recently.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.