[
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12569352#action_12569352
]
Raghu Angadi commented on HADOOP-2758:
--------------------------------------
Note regd 'dfs -cat' numbers: These are end to end tests and numbers vary
depending on how many instances we run. Just as in any end-to-end test there
are multiple factors that are not affected by this patch. This patch reduces
CPU consumed by DataNode while serving data. It cannot be directly comapred
from 'dfs -cat' numbers.
I have _semi-directly_ calculated DataNode with the patch takes *35-45% of CPU
it used to take before*. This calculation uses 9/10 ratio from HADOOP-2144.
'top' on my dev box truncates summed up cpu to 99.9 (unlike on the machine used
in HADOOP-2144), other wise we could directly compare CPU taken by DataNode
instead of calculating it indirectly.
Sameer asked me to compare single instance of 'dfs -cat' and regular shell
'cat'. I will add those numbers in the next comment.
> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
> Key: HADOOP-2758
> URL: https://issues.apache.org/jira/browse/HADOOP-2758
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on
> the 'read path' (i.e. path from storage on datanode to user buffer on the
> client). This jira reduces these copies by enhancing data read protocol and
> implementation of read on both datanode and the client. I will describe the
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause
> regression in any benchmarks. It might not improve the benchmarks since most
> benchmarks are not cpu bound.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.