[ 
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051137#comment-13051137
 ] 

Todd Lipcon commented on HDFS-2080:
-----------------------------------

Nathan: yea, both CPU time and sys time improve by these optimizations.

Kihwal: using zlib instead of the hardware crc gives only about a 40% 
improvement. It's true that a disk won't pump out data at rates approaching 
1GB/sec, but Nathan's metric of CPUsecs/MB is still very important, eg on 
multitenant clusters. Another important case is the HBase serving case where 
the majority of the data being read from HDFS will actually be in the Linux 
buffer cache. I've benchmarked that 3/4 of the latency of such reads comes from 
CPU-time rather than context switching (try TestHFileSeek from HBase on 
RawLocalFS vs LocalFS)

> Speed up DFS read path
> ----------------------
>
>                 Key: HDFS-2080
>                 URL: https://issues.apache.org/jira/browse/HDFS-2080
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>
> I've developed a series of patches that speeds up the HDFS read path by a 
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from 
> buffer cache) and also will make it easier to allow for advanced users (eg 
> hbase) to skip a buffer copy. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to