[ 
https://issues.apache.org/jira/browse/HDFS-2080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HDFS-2080:
------------------------------

    Attachment: hdfs-2080.txt

Here's a combined patch to demonstrate the speed improvements.  It will need to 
be split up into a few separate changes to be committed. Here's a summary of 
the changes in this somewhat large patch:

bq. common/LICENSE.txt                                 |   20 +
- I borrowed some code from some BSD-licensed projects (hstore and 
"Slicing-by-8")

bq. common/bin/hadoop-config.sh                        |    8 +-
- Fixes a bug introduced with RPMs where it wouldn't find native code properly 
from within the build dir.

bq.  common/build.xml                                   |    9 +
- adds javah for the new NativeCrc32 class

bq.  .../java/org/apache/hadoop/util/DataChecksum.java  |   64 +++-
- Adds new CHECKSUM_CRC32C type for the "CRC32C" polynomial which has hardware 
support.
- Adds a copyOf() to create a new DataChecksum given an existing instance.
- Generalizes some checks for CRC32 to now apply to all size-4 checksums.
- Adds new verifySums function, basically borrowed from FSInputChecker.java but 
operating on ByteBuffer instead, and calling out to native code when available

bq. .../java/org/apache/hadoop/util/NativeCrc32.java   |   68 +++
- Small wrapper around the new native code

bq. .../org/apache/hadoop/util/PureJavaCrc32C.java     |  454 ++++++++++++++++
- copy of PureJavaCrc32 but for the new polynomial. Identical code but 
different tables.

bq. common/src/native/Makefile.am                      |    6 +-
- adds new C code for crc32

bq. .../src/org/apache/hadoop/util/NativeCrc32.c       |  149 ++++++
- implementation of verifySums using the native code

bq. .../src/native/src/org/apache/hadoop/util/crc32.h  |  133 +++++
- C implementations of "slicing-by-8" for CRC32 and CRC32C

bq. .../hadoop/util/crc32_zlib_polynomial_tables.h     |  552 
++++++++++++++++++++
bq. .../src/org/apache/hadoop/util/crc32c_tables.h     |  313 +++++++++++
- codegenned tables for the above algorithms

bq. .../native/src/org/apache/hadoop/util/x86_crc32c.h |  181 +++++++
- cpu-detection code to determine if SSE4.2 extensions are available
- implementations using the hardware crc32 operation for 32-bit and 64-bit

bq. .../org/apache/hadoop/util/TestPureJavaCrc32.java  |   14 +-
- improvements to generate Table for arbitrary polynomials

bq. hdfs/build.xml                                     |    1 +
- change which might be a bad idea, which lets hdfs pick up the native 
libraries from the built common

bq. .../java/org/apache/hadoop/hdfs/BlockReader.java   |  330 +++++--------
- rewrites BlockReader to not inherit from FSInputChecker, thus making it much 
simpler
- now calls the "bulk verify" method in DataChecksum


bq. .../org/apache/hadoop/hdfs/DFSOutputStream.java    |    5 +-
- change default checksum to CRC32C

bq. .../org/apache/hadoop/hdfs/DFSInputStream.java     |    8 +-
bq. .../hadoop/hdfs/server/common/JspHelper.java       |    7 +-
- use IOUtils.readFully instead of the duplicate code from BlockReader

bq. .../hadoop/hdfs/server/datanode/BlockReceiver.java |    3 +-
- don't assume checksum is always CRC32 for partial chunk append

> Speed up DFS read path
> ----------------------
>
>                 Key: HDFS-2080
>                 URL: https://issues.apache.org/jira/browse/HDFS-2080
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.23.0
>
>         Attachments: hdfs-2080.txt
>
>
> I've developed a series of patches that speeds up the HDFS read path by a 
> factor of about 2.5x (~300M/sec to ~800M/sec for localhost reading from 
> buffer cache) and also will make it easier to allow for advanced users (eg 
> hbase) to skip a buffer copy. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to