[
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050939#comment-13050939
]
Todd Lipcon commented on HDFS-2080:
-----------------------------------
The improvements are the following:
- Simplify the code for BlockReader by no longer inheriting from
FSInputChecker. Read entire 64KB packets at a time into a direct byte buffer
with a single read() syscall [slight speed improvement]
- Once the entire 64K buffer is ready, bulk-verify all of the CRCs with a
single call (currently there's a small semantic change associated with this,
but it could be fixed without hurting performance much if necessary) [15%
improvement]
- Implement the bulk-verification of CRC code via JNI [60% improvement]
- On processors supporting SSE4.2 (eg Nehalem/Westmere) use the crc32c assembly
instruction to calculate checksums [~2.5x improvement]
- there's one more optimization I haven't done yet here to improve the
pipelining of the SSE instructions
Unfortunately the last improvement requires introducing a new Checksum
implementation, since the hardware implements the iSCSI polynomial instead of
the zlib polynomial. Fortunately we have a header field everywhere we use
checksums, so introducing a new polynomial can be done in a
backwards-compatible way.
With these optimizations, performance is within 15% of non-checksummed reads.
> Speed up DFS read path
> ----------------------
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Affects Versions: 0.23.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.23.0
>
>
> I've developed a series of patches that speeds up the HDFS read path by a
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from
> buffer cache) and also will make it easier to allow for advanced users (eg
> hbase) to skip a buffer copy.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira