[ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569375#action_12569375
 ] 

Raghu Angadi commented on HADOOP-2758:
--------------------------------------


Comparision of single instance of 'dfs -cat 5Gbfile > /dev/null" with 'cat 
5Gbfile > /dev/null'. All the data resides locally on a 4 disk RAID0 partition 
: 

||  min:sec || cat || dfs -cat with 0.16 || dfs -cat with the patch ||
| run 1 | 2:40 | 3:44 | 3:24 |
| run 2 | 2:56 | 3:05 | 3:51 |
| run 3 | 3:01 | 3:18 | 2:51 |

What would you conclude? Both of the obvious conclusions are incorrect :
# dfs -cat is almost as good as simple cat.
# this patch does not help mu.

 If we had a single disk partition, the numbers would be even closer.




> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to