[
https://issues.apache.org/jira/browse/HADOOP-2559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571670#action_12571670
]
lohit vijayarenu commented on HADOOP-2559:
------------------------------------------
I ran same set of experiments 4 times instead of 2. Here are the results
{noformat}
Job Trunk Trunk+patch1
Trunk+patch2
RandomWriter 1346 923 1607
RandomWriter 743 571
1111
RandomWriter 698 497
1003
RandomWriter 776 508
963
Sort 1535 2027 1802
Sort 1466 1869 1768
Sort 1618 1787 1738
Sort 1699 2044 1515
{noformat}
Interesting note is that Trunk+patch1 writes have better time compared to Trunk.
While sort, I see many tasks failing due to ChecksumException which succeeds on
other nodes in retry which affects the time show for sort jobs.
log:org.apache.hadoop.fs.ChecksumException: Checksum error:
/tmps/3/gs203727-22269-2527764705241834/mapred-tt/mapred-local/task_200802222124_0007_m_001749_0/file.out
at 17018368
Runping suggested we run wordcount instead, will do that and post the results.
> DFS should place one replica per rack
> -------------------------------------
>
> Key: HADOOP-2559
> URL: https://issues.apache.org/jira/browse/HADOOP-2559
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Runping Qi
> Assignee: lohit vijayarenu
> Attachments: HADOOP-2559-1.patch, HADOOP-2559-2.patch
>
>
> Currently, when writing out a block, dfs will place one copy to a local data
> node, one copy to a rack local node
> and another one to a remote node. This leads to a number of undesired
> properties:
> 1. The block will be rack-local to two tacks instead of three, reducing the
> advantage of rack locality based scheduling by 1/3.
> 2. The Blocks of a file (especiallya large file) are unevenly distributed
> over the nodes: One third will be on the local node, and two thirds on the
> nodes on the same rack. This may make some nodes full much faster than
> others,
> increasing the need of rebalancing. Furthermore, this also make some nodes
> become "hot spots" if those big
> files are popular and accessed by many applications.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.