[ 
https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-5598:
--------------------------------

    Attachment: crc32-results.txt
                TestCrc32Performance.java
                hadoop-5598.txt

This is a patch to implement CRC32 in Pure Java, along with a performance test 
that shows its improvement. Also attaching the benchmark output from both Sun 
1.6.0_12 and OpenJDK 1.6.0_0-b12, which looks pretty different.

The summary is that, on Sun's JDK (which most people use), the pure Java 
implementation is faster for all chunk sizes less than 32 bytes (by a high 
factor for the smaller end of the spectrum) and about 33% slower for chunk 
sizes larger than that. On OpenJDK, the CRC32 implementation is 3-4x faster 
than the Sun JDK.

Running the concurrency benchmark from HADOOP-5318 also shows huge improvements 
(the same as was seen with Ben's buffering patch) by using the pure Java CRC32. 
This patch contains the change to FSDataOutputStream to make use of it.

Review from someone who understands Java's bit extension semantics better than 
me would be appreciated - I bet more performance can be squeezed out of this by 
a Java bitwise op master.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Owen O'Malley
>         Attachments: crc32-results.txt, hadoop-5598.txt, 
> TestCrc32Performance.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to