[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Raghu Angadi (JIRA) Tue, 12 Feb 2008 23:06:31 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12568452#action_12568452
 ]


Raghu Angadi commented on HADOOP-2758:
--------------------------------------

With a prelimirary patch that removes extra copies on datanode while reading a 
block, the results are promising.
Ran 4 instances of 'dfs -cat 5GbFile > /dev/null'  similar to the tests in 
HADOOP-2144.  All the blocks are local.

branch-0.16 : ~4 min. cpu bound. user cpu is 3 times the kernel cpu.
trunk + patch : ~3min. disk bound. user cpu is 2 times the kernel cpu. not that 
much of cpu was left (~10-20%). 

Also from HADOOP-2144, datanode cpu is around 0.9 times DFSClient cpu. Even 
after ignoring idle cpu in the second test, datanode takes less than half of 
cpu with the patch. This includes both user and kernel cpu taken by datanode. 
Assuming kernel cpu is same in both cases, the user cpu taken by datanode in 
second test would much less than half (may be closer 1/3rd).


> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Reply via email to