[jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator

Todd Lipcon (JIRA) Wed, 17 Jun 2009 17:04:33 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12720996#action_12720996
 ]


Todd Lipcon commented on HADOOP-5598:
-------------------------------------

Scott: I just tried your version and was unable to get the same performance 
improvements. I think we've established that the pure Java definitely wins on 
small blocks. For large blocks, I'm seeing the following on my laptop (64-bit, 
with 64-bit JRE):

My most recent non-evil pure-Java: 250M/sec
Scott's patch that unrolls the loop: 260-280M/sec
Sun Java 1.6 update 14: 333M/sec
OpenJDK 1.6: 795M/sec

The OpenJDK implementation is simply wrapping zlib's crc32 routine, which must 
be highly optimized. Given that we already have a JNI library for native 
compression using zlib, I'd like to simply add a stub to libhadoop that wraps 
zlib's crc32. That should give us the same ~800M/sec throughput for large 
blocks. Since we can implement the stub ourself, we also have the ability to 
switch to pure Java for small sizes and get the 20x speedup with no adversarial 
workloads that cause bad performance. On systems where the native code isn't 
available, we can simply use the pure Java for all sizes, since at worst it's 
only slightly slower than java.util.Crc32 and at best it's 30x faster.

I imagine that most production systems are using libhadoop, or at least could 
easily get this deployed if it was shown to have significant performance 
benefits.

I'll upload a patch later this evening for this.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-5598
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5598
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: crc32-results.txt, hadoop-5598-evil.txt, 
> hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, 
> TestCrc32Performance.java, TestCrc32Performance.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5598) Implement a pure Java CRC32 calculator

Reply via email to