[ 
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053897#comment-14053897
 ] 

Todd Lipcon commented on HADOOP-10778:
--------------------------------------

It also would depend a lot on the version of zlib that you've got on your 
system. The CRC32 implementation in java.util.zip is just a wrapper around 
zlib's crc32 function, even in the latest Java: 
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/native/java/util/zip/CRC32.c

So, we should probably run this benchmark on a system which is representative 
of customer servers, not just an OSX laptop. Have you had a chance to try it on 
eg a Sandy Bridge server running RHEL 6?

The other factor which isn't captured by the benchmark is the cost of the JVM 
critical section ("GetPrimitiveArrayCritical"). While in such a critical 
section, GCs are blocked, and any request to start a GC will end up blocking 
all threads until the thread within the critical section exits. This can affect 
GC pause time pretty greatly if you are making CRC calls of large buffers. This 
was one of the major reasons to switch to the pure Java CRC32 if I recall 
correctly -- not just pure throughput.

The new work that [~james.thomas] is doing with native CRC avoids the above 
problem by chunking the CRC calculation into smaller chunks -- the same trick 
that the JVM uses when memcpying large byte[] arrays. This avoids long critical 
sections and the above-mentioned problem where all threads block while entering 
a minor GC.

> Use NativeCrc32 only if it is faster
> ------------------------------------
>
>                 Key: HADOOP-10778
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10778
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: util
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: c10778_20140702.patch
>
>
> From the benchmark post in [this 
> comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
>  NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when 
> bytesPerChecksum > 512.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to