[
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569375#action_12569375
]
Raghu Angadi commented on HADOOP-2758:
--------------------------------------
Comparision of single instance of 'dfs -cat 5Gbfile > /dev/null" with 'cat
5Gbfile > /dev/null'. All the data resides locally on a 4 disk RAID0 partition
:
|| min:sec || cat || dfs -cat with 0.16 || dfs -cat with the patch ||
| run 1 | 2:40 | 3:44 | 3:24 |
| run 2 | 2:56 | 3:05 | 3:51 |
| run 3 | 3:01 | 3:18 | 2:51 |
What would you conclude? Both of the obvious conclusions are incorrect :
# dfs -cat is almost as good as simple cat.
# this patch does not help mu.
If we had a single disk partition, the numbers would be even closer.
> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
> Key: HADOOP-2758
> URL: https://issues.apache.org/jira/browse/HADOOP-2758
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on
> the 'read path' (i.e. path from storage on datanode to user buffer on the
> client). This jira reduces these copies by enhancing data read protocol and
> implementation of read on both datanode and the client. I will describe the
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause
> regression in any benchmarks. It might not improve the benchmarks since most
> benchmarks are not cpu bound.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.