[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Raghu Angadi (JIRA) Fri, 29 Feb 2008 20:14:54 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574066#action_12574066
 ]


Raghu Angadi commented on HADOOP-2758:
--------------------------------------


More comparisions. I hope this shows the improvements.

Test : Run *6* instances of 'cat 5GbFile > /dev/null' using a single node 
cluster. All the blocks are located on local disks (RAID0) I think.

The hdfs tests include *constant costs* : Client cpu and kernel cpu not on 
behalf of user processes. Client cpu is at least as much as DataNodes. This 
implies, 25% improvement in wall clock time implies more that 50% improvement 
in DataNode cpu.

||Test || Bound By || Run1 || Run2 || Run3 || Percentage || Avg || Note ||
| Trunk | CPU | 355 | 332 | 346 | 344 | 100% | |
| Trunk + patch | CPU | 225 | 213 | 228 | 222 | 65% | |
| cat command | Disk IO | 185 | 83 | 105 | 124 | 36% | Not really comparable|

Even 21 instances of 'cat allBlocksForFile > /dev/null' was not CPU bound. 
'cat' takes virtually zero cpu in user space. 

 



> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch, 
> HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Reply via email to