[
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051137#comment-13051137
]
Todd Lipcon commented on HDFS-2080:
-----------------------------------
Nathan: yea, both CPU time and sys time improve by these optimizations.
Kihwal: using zlib instead of the hardware crc gives only about a 40%
improvement. It's true that a disk won't pump out data at rates approaching
1GB/sec, but Nathan's metric of CPUsecs/MB is still very important, eg on
multitenant clusters. Another important case is the HBase serving case where
the majority of the data being read from HDFS will actually be in the Linux
buffer cache. I've benchmarked that 3/4 of the latency of such reads comes from
CPU-time rather than context switching (try TestHFileSeek from HBase on
RawLocalFS vs LocalFS)
> Speed up DFS read path
> ----------------------
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Affects Versions: 0.23.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.23.0
>
>
> I've developed a series of patches that speeds up the HDFS read path by a
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from
> buffer cache) and also will make it easier to allow for advanced users (eg
> hbase) to skip a buffer copy.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira