[ https://issues.apache.org/jira/browse/HADOOP-1448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hairong Kuang updated HADOOP-1448: ---------------------------------- Attachment: getBlockLocation.patch This patch is a little simpler than what I have proposed. !. The returned list contains all the replica locations. 2. In the returned loction list, a local replica is followed by a local-rack replica, then followed by the rest replicas. 3. If there is not any local replica or local-rack replicas, the first in the list is a random replica. > Setting the replication factor of a file too high causes namenode cpu overload > ------------------------------------------------------------------------------ > > Key: HADOOP-1448 > URL: https://issues.apache.org/jira/browse/HADOOP-1448 > Project: Hadoop > Issue Type: Bug > Components: dfs > Reporter: dhruba borthakur > Attachments: getBlockLocation.patch > > > The replication factor of a file in set to 300 (on a 800 node cluster). Then > all mappers try to open this file. For every open call that the namenode > receives from each of these 800 clients, it sorts all the replicas of the > block(s) based on the distance from the client. This causes CPU usage > overload on the namenode. > One proposal is to make the namenode return a non-sorted list of datanodes to > the client. Information about each replica also contains the rack on which > that replica resides. The client can look at the replicas to determine if > there is a copy on the local node. If not, then it can find out if there is a > replica on the local rack. If not then it can choose a replica at random. > This proposal is scalable because the sorting and selection of replicas is > done by the client rather than the Namenode. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.