[
https://issues.apache.org/jira/browse/HADOOP-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573885#action_12573885
]
Owen O'Malley commented on HADOOP-2559:
---------------------------------------
Ok, I'd like a little more data please, Lohit. Please make sure you test with
HADOOP-1985.
I'd like to see two tests:
A full 200 node test:
1. random writer
2. scan (just map-reduce input, no shuffle or reduces)
a. record number of node and rack local maps
A lopsided 200 node test:
1. random writer with 10 maps
a. block distribution by node
b. node distribution by rack
2. 200 node scan
a. node distribution by rack
b. number of node and rack local maps
Thanks!
> DFS should place one replica per rack
> -------------------------------------
>
> Key: HADOOP-2559
> URL: https://issues.apache.org/jira/browse/HADOOP-2559
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Runping Qi
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2559-1.patch, HADOOP-2559-2.patch
>
>
> Currently, when writing out a block, dfs will place one copy to a local data
> node, one copy to a rack local node
> and another one to a remote node. This leads to a number of undesired
> properties:
> 1. The block will be rack-local to two tacks instead of three, reducing the
> advantage of rack locality based scheduling by 1/3.
> 2. The Blocks of a file (especiallya large file) are unevenly distributed
> over the nodes: One third will be on the local node, and two thirds on the
> nodes on the same rack. This may make some nodes full much faster than
> others,
> increasing the need of rebalancing. Furthermore, this also make some nodes
> become "hot spots" if those big
> files are popular and accessed by many applications.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.