[
https://issues.apache.org/jira/browse/HADOOP-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731752#action_12731752
]
Scott Carey commented on HADOOP-6148:
-------------------------------------
{quote}It seems to me that, on the 64-bit JVM, most of the implementations are
within margin of error at the sizes that are most often exercised (128 to 256
bytes).{quote}
What are the most common use cases, and where else should this code be used
other than HDFS? For HDFS, the default checksum block size is 512 bytes. For
the bzip2 code, it is using its own CRC32 -- perhaps that should change. For
any .zip file compression or decompression, I'm not sure what the typical use
case is.
> Implement a pure Java CRC32 calculator
> --------------------------------------
>
> Key: HADOOP-6148
> URL: https://issues.apache.org/jira/browse/HADOOP-6148
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Todd Lipcon
> Attachments: benchmarks20090714.txt, benchmarks20090715.txt,
> crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt,
> hadoop-5598.txt, hadoop-5598.txt, hdfs-297.txt, PureJavaCrc32.java,
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java,
> PureJavaCrc32New.java, PureJavaCrc32NewInner.java, PureJavaCrc32NewLoop.java,
> TestCrc32Performance.java, TestCrc32Performance.java,
> TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a
> long time in crc calculation. In particular, it was spending 5 seconds in crc
> calculation out of a total of 6 for the write. I suspect that it is the
> java-jni border that is causing us grief.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.