[
https://issues.apache.org/jira/browse/HADOOP-10778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14053897#comment-14053897
]
Todd Lipcon commented on HADOOP-10778:
--------------------------------------
It also would depend a lot on the version of zlib that you've got on your
system. The CRC32 implementation in java.util.zip is just a wrapper around
zlib's crc32 function, even in the latest Java:
https://github.com/openjdk-mirror/jdk7u-jdk/blob/master/src/share/native/java/util/zip/CRC32.c
So, we should probably run this benchmark on a system which is representative
of customer servers, not just an OSX laptop. Have you had a chance to try it on
eg a Sandy Bridge server running RHEL 6?
The other factor which isn't captured by the benchmark is the cost of the JVM
critical section ("GetPrimitiveArrayCritical"). While in such a critical
section, GCs are blocked, and any request to start a GC will end up blocking
all threads until the thread within the critical section exits. This can affect
GC pause time pretty greatly if you are making CRC calls of large buffers. This
was one of the major reasons to switch to the pure Java CRC32 if I recall
correctly -- not just pure throughput.
The new work that [~james.thomas] is doing with native CRC avoids the above
problem by chunking the CRC calculation into smaller chunks -- the same trick
that the JVM uses when memcpying large byte[] arrays. This avoids long critical
sections and the above-mentioned problem where all threads block while entering
a minor GC.
> Use NativeCrc32 only if it is faster
> ------------------------------------
>
> Key: HADOOP-10778
> URL: https://issues.apache.org/jira/browse/HADOOP-10778
> Project: Hadoop Common
> Issue Type: Improvement
> Components: util
> Reporter: Tsz Wo Nicholas Sze
> Assignee: Tsz Wo Nicholas Sze
> Attachments: c10778_20140702.patch
>
>
> From the benchmark post in [this
> comment|https://issues.apache.org/jira/browse/HDFS-6560?focusedCommentId=14044060&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14044060],
> NativeCrc32 is slower than java.util.zip.CRC32 for Java 7 and above when
> bytesPerChecksum > 512.
--
This message was sent by Atlassian JIRA
(v6.2#6252)