[
https://issues.apache.org/jira/browse/HADOOP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509953
]
Hairong Kuang commented on HADOOP-1448:
---------------------------------------
For the getBlockLocation operation, how about the following improvements?
1. For each block, the maximun number of replica locations returned is 3.
2. Instead of sorting, using the pseudo-sort in HADOOP-1155. The returned list
is in the order: local replica, local-rack replicas, and off-rack replicas.
This algorithm does not require full linear scan of all replicas.
3. For the randomness, if there is a local copy, the first location is the
local copy; otherwise if there are any local-rack copies, set the first
location to be a random local-rack replica; otherwise, the first location is a
random off-rack replica.
> Setting the replication factor of a file too high causes namenode cpu overload
> ------------------------------------------------------------------------------
>
> Key: HADOOP-1448
> URL: https://issues.apache.org/jira/browse/HADOOP-1448
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Reporter: dhruba borthakur
>
> The replication factor of a file in set to 300 (on a 800 node cluster). Then
> all mappers try to open this file. For every open call that the namenode
> receives from each of these 800 clients, it sorts all the replicas of the
> block(s) based on the distance from the client. This causes CPU usage
> overload on the namenode.
> One proposal is to make the namenode return a non-sorted list of datanodes to
> the client. Information about each replica also contains the rack on which
> that replica resides. The client can look at the replicas to determine if
> there is a copy on the local node. If not, then it can find out if there is a
> replica on the local rack. If not then it can choose a replica at random.
> This proposal is scalable because the sorting and selection of replicas is
> done by the client rather than the Namenode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.