[
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HDFS-2080:
------------------------------
Attachment: hdfs-2080.txt
Here's a combined patch to demonstrate the speed improvements. It will need to
be split up into a few separate changes to be committed. Here's a summary of
the changes in this somewhat large patch:
bq. common/LICENSE.txt | 20 +
- I borrowed some code from some BSD-licensed projects (hstore and
"Slicing-by-8")
bq. common/bin/hadoop-config.sh | 8 +-
- Fixes a bug introduced with RPMs where it wouldn't find native code properly
from within the build dir.
bq. common/build.xml | 9 +
- adds javah for the new NativeCrc32 class
bq. .../java/org/apache/hadoop/util/DataChecksum.java | 64 +++-
- Adds new CHECKSUM_CRC32C type for the "CRC32C" polynomial which has hardware
support.
- Adds a copyOf() to create a new DataChecksum given an existing instance.
- Generalizes some checks for CRC32 to now apply to all size-4 checksums.
- Adds new verifySums function, basically borrowed from FSInputChecker.java but
operating on ByteBuffer instead, and calling out to native code when available
bq. .../java/org/apache/hadoop/util/NativeCrc32.java | 68 +++
- Small wrapper around the new native code
bq. .../org/apache/hadoop/util/PureJavaCrc32C.java | 454 ++++++++++++++++
- copy of PureJavaCrc32 but for the new polynomial. Identical code but
different tables.
bq. common/src/native/Makefile.am | 6 +-
- adds new C code for crc32
bq. .../src/org/apache/hadoop/util/NativeCrc32.c | 149 ++++++
- implementation of verifySums using the native code
bq. .../src/native/src/org/apache/hadoop/util/crc32.h | 133 +++++
- C implementations of "slicing-by-8" for CRC32 and CRC32C
bq. .../hadoop/util/crc32_zlib_polynomial_tables.h | 552
++++++++++++++++++++
bq. .../src/org/apache/hadoop/util/crc32c_tables.h | 313 +++++++++++
- codegenned tables for the above algorithms
bq. .../native/src/org/apache/hadoop/util/x86_crc32c.h | 181 +++++++
- cpu-detection code to determine if SSE4.2 extensions are available
- implementations using the hardware crc32 operation for 32-bit and 64-bit
bq. .../org/apache/hadoop/util/TestPureJavaCrc32.java | 14 +-
- improvements to generate Table for arbitrary polynomials
bq. hdfs/build.xml | 1 +
- change which might be a bad idea, which lets hdfs pick up the native
libraries from the built common
bq. .../java/org/apache/hadoop/hdfs/BlockReader.java | 330 +++++--------
- rewrites BlockReader to not inherit from FSInputChecker, thus making it much
simpler
- now calls the "bulk verify" method in DataChecksum
bq. .../org/apache/hadoop/hdfs/DFSOutputStream.java | 5 +-
- change default checksum to CRC32C
bq. .../org/apache/hadoop/hdfs/DFSInputStream.java | 8 +-
bq. .../hadoop/hdfs/server/common/JspHelper.java | 7 +-
- use IOUtils.readFully instead of the duplicate code from BlockReader
bq. .../hadoop/hdfs/server/datanode/BlockReceiver.java | 3 +-
- don't assume checksum is always CRC32 for partial chunk append
> Speed up DFS read path
> ----------------------
>
> Key: HDFS-2080
> URL: https://issues.apache.org/jira/browse/HDFS-2080
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Affects Versions: 0.23.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Fix For: 0.23.0
>
> Attachments: hdfs-2080.txt
>
>
> I've developed a series of patches that speeds up the HDFS read path by a
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from
> buffer cache) and also will make it easier to allow for advanced users (eg
> hbase) to skip a buffer copy.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira