[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Raghu Angadi (JIRA) Fri, 15 Feb 2008 10:20:28 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569352#action_12569352
 ]


Raghu Angadi commented on HADOOP-2758:
--------------------------------------

Note regd 'dfs -cat' numbers: These are end to end tests and numbers vary 
depending on how many instances we run. Just as in any end-to-end test there 
are multiple factors that are not affected by this patch. This patch reduces 
CPU consumed by DataNode while serving data. It cannot be directly comapred 
from 'dfs -cat' numbers. 

I have _semi-directly_ calculated DataNode with the patch takes *35-45% of CPU 
it used to take before*. This calculation uses 9/10 ratio from HADOOP-2144. 
'top' on my dev box truncates summed up cpu to 99.9 (unlike on the machine used 
in HADOOP-2144), other wise we could directly compare CPU taken by DataNode 
instead of calculating it indirectly.

Sameer asked me to compare single instance of 'dfs -cat' and regular shell 
'cat'. I will add those numbers in the next comment. 

> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
>                 Key: HADOOP-2758
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2758
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>             Fix For: 0.17.0
>
>         Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on 
> the 'read path' (i.e. path from storage on datanode to user buffer on the 
> client). This jira reduces these copies by enhancing data read protocol and 
> implementation of read on both datanode and the client. I will describe the 
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause 
> regression in any benchmarks. It might not improve the benchmarks since most 
> benchmarks are not cpu bound.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2758) Reduce memory copies when data is read from DFS

Reply via email to