[
https://issues.apache.org/jira/browse/HDFS-782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12781481#action_12781481
]
Ning Zhang commented on HDFS-782:
---------------------------------
Thanks for the comments Konstantin.
The first two issues you pointed out are intentional since the current
replication mechanism is too slow in a large and busy cluster. Based on my
experiments, changing replication factor of a very small file (6 MB) from 3 to
6 takes around 15 seconds, but changing from 3 to 18 takes more than 18 mins.
If there are too many clients are requesting the same block, the most effective
way is to spend some bandwidth to quickly replicate the block to other DN nodes
and distribute the work load to other nodes. I think this actually also solve
the network inbound issue where too many clients are requesting connection to
the same NIC, and the probability of packet loss will increase dramatically.
The 3rd issue is very valid. We have not considered the situation where the
cluster is shared in the case of Amonazon EC2. I agree with Dhruba that we
should come up with some heuristics to limit the "replication storm".
> dynamic replication
> -------------------
>
> Key: HDFS-782
> URL: https://issues.apache.org/jira/browse/HDFS-782
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Ning Zhang
>
> In a large and busy cluster, a block can be requested by many clients at the
> same time. HDFS-767 tries to solve the failing case when the # of retries
> exceeds the maximum # of retries. However, that patch doesn't solve the
> performance issue since all failing clients have to wait a certain period
> before retry, and the # of retries could be high.
> One solution to solve the performance issue is to increase the # of replicas
> for this "hot" block dynamically when it is requested many times at a short
> period. The name node need to be aware such situation and only clean up extra
> replicas when they are not accessed recently.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.