[
https://issues.apache.org/jira/browse/HADOOP-2758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Raghu Angadi updated HADOOP-2758:
---------------------------------
Attachment: HADOOP-2758.patch
Attached patch removes extra buffer copies when data is read from the data node
(by client or while replicating).
- before : disk --> large bufferedinputstream --> small datanode buffer -->
large bufferedoutputstream --> socket.
- after : disk --> large datanode buffer --> socket
- each arrow represents a memory copy. cost of arrows at the ends is share
between user and kernel, I think (using direct buffer might further reduce
that, will try.).
I will post more microbenchmarks similar to last comment.
We can reduce one copy on the DFSClient. Current {{readChunk()}} interface in
{{FSInputChecker}} does not allow it. We could add optional {{readChunks()}} so
that an implementation can get access to user's complete buffer. There will be
a default implementation of this. Should I file a jira?
This patch changes the DATA_TRANSFER_PROTOCOL a bit.
Currently there are no improvements in buffering whilre writing data to DFS. I
will do that in a follow up jira.
All the unit tests pass. I will run them on windows as well. No new tests are
added since this does not actually change any functionality and purely a
performance improvement.
> Reduce memory copies when data is read from DFS
> -----------------------------------------------
>
> Key: HADOOP-2758
> URL: https://issues.apache.org/jira/browse/HADOOP-2758
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Reporter: Raghu Angadi
> Assignee: Raghu Angadi
> Fix For: 0.17.0
>
> Attachments: HADOOP-2758.patch
>
>
> Currently datanode and client part of DFS perform multiple copies of data on
> the 'read path' (i.e. path from storage on datanode to user buffer on the
> client). This jira reduces these copies by enhancing data read protocol and
> implementation of read on both datanode and the client. I will describe the
> changes in next comment.
> Requirement is that this fix should reduce CPU used and should not cause
> regression in any benchmarks. It might not improve the benchmarks since most
> benchmarks are not cpu bound.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.