[ 
https://issues.apache.org/jira/browse/HADOOP-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731752#action_12731752
 ] 

Scott Carey commented on HADOOP-6148:
-------------------------------------

{quote}It seems to me that, on the 64-bit JVM, most of the implementations are 
within margin of error at the sizes that are most often exercised (128 to 256 
bytes).{quote}

What are the most common use cases, and where else should this code be used 
other than HDFS?  For HDFS, the default checksum block size is 512 bytes.  For 
the bzip2 code, it is using its own CRC32  -- perhaps that should change.  For 
any .zip file compression or decompression, I'm not sure what the typical use 
case is.  

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-6148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6148
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: benchmarks20090714.txt, benchmarks20090715.txt, 
> crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt, 
> hadoop-5598.txt, hadoop-5598.txt, hdfs-297.txt, PureJavaCrc32.java, 
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java, 
> PureJavaCrc32New.java, PureJavaCrc32NewInner.java, PureJavaCrc32NewLoop.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to