[jira] Commented: (HADOOP-2559) DFS should place one replica per rack

Runping Qi (JIRA) Wed, 05 Mar 2008 06:32:53 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12575363#action_12575363
 ]


Runping Qi commented on HADOOP-2559:
------------------------------------


Lohit,

Great work!

Clearly, the block distributation with patch2 is better than the one with 
patch1, and much better than that with trunk.
For the scan job, patch1 and patch2 performed about the same, and both better 
than the trunk for about 20%

The number for random writers are interesting.
The first test, where 200 nodes writes concurrently, shows  trunk and patch1 
were better than  patch2 for about 15%.
Test 2 shows that patch2 performed best, and both patch1 and patch2 were better 
than trunk!
I suspect that the disk space of those nodes running the mappers might have 
reached the limit, thus, 
blocks could not be placed on local nodes.




> DFS should place one replica per rack
> -------------------------------------
>
>                 Key: HADOOP-2559
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2559
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Runping Qi
>            Assignee: lohit vijayarenu
>         Attachments: HADOOP-2559-1.patch, HADOOP-2559-2.patch, 
> Patch1_Block_Report.png.jpg, Patch1_Rack_Node_Mapping.jpg, Patch2 Block 
> Report.jpg, Patch2_Rack_Node_Mapping.jpg, Trunk_Block_Report.png, 
> Trunk_Rack_Node_Mapping.jpg
>
>
> Currently, when writing out a block, dfs will place one copy to a local data 
> node, one copy to a rack local node
> and another one to a remote node. This leads to a number of undesired 
> properties:
> 1. The block will be rack-local to two tacks instead of three, reducing the 
> advantage of rack locality based scheduling by 1/3.
> 2. The Blocks of a file (especiallya  large file) are unevenly distributed 
> over the nodes: One third will be on the local node, and two thirds on the 
> nodes on the same rack. This may make some nodes full much faster than 
> others, 
> increasing the need of rebalancing. Furthermore, this also make some nodes 
> become "hot spots" if those big 
> files are popular and accessed by many applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2559) DFS should place one replica per rack

Reply via email to