[
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12574066#action_12574066
]
Raghu Angadi commented on HADOOP-2758:
--------------------------------------
More comparisions. I hope this shows the improvements.
Test : Run *6* instances of 'cat 5GbFile > /dev/null' using a single node
cluster. All the blocks are located on local disks (RAID0) I think.
The hdfs tests include *constant costs* : Client cpu and kernel cpu not on
behalf of user processes. Client cpu is at least as much as DataNodes. This
implies, 25% improvement in wall clock time implies more that 50% improvement
in DataNode cpu.
||Test || Bound By || Run1 || Run2 || Run3 || Percentage || Avg || Note ||
| Trunk | CPU | 355 | 332 | 346 | 344 | 100% | |
| Trunk + patch | CPU | 225 | 213 | 228 | 222 | 65% | |
| cat command | Disk IO | 185 | 83 | 105 | 124 | 36% | Not really comparable|
Even 21 instances of 'cat allBlocksForFile > /dev/null' was not CPU bound.
'cat' takes virtually zero cpu in user space.
> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
> Key: HADOOP-2758
> URL: https://issues.apache.org/jira/browse/HADOOP-2758
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch,
> HADOOP-2758.patch, HADOOP-2758.patch, HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on
> the 'read path' (i.e. path from storage on datanode to user buffer on the
> client). This jira reduces these copies by enhancing data read protocol and
> implementation of read on both datanode and the client. I will describe the
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause
> regression in any benchmarks. It might not improve the benchmarks since most
> benchmarks are not cpu bound.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.