[ 
https://issues.apache.org/jira/browse/HADOOP-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-6148:
--------------------------------

    Attachment: hadoop-6148.txt

Attached is an updated patch. The changes here from the previous version:

- Replaced the implementation with the NewInner version by Scott.
- Cleaned up the Performance Test a bit and included it as a static inner class 
of TestPureJavaCrc32 - it can be run using java 
'org.apache.hadoop.util.TestPureJavaCrc32$PerformanceTest'
- The patch also includes changes to ChecksumFileSystem and util.DataChecksum

Note that I did not move the test out of src/test/core - it looks like 
src/test/core still exists in the common repository - all of the tests for 
common are still in there.

As for testing, I reran the unit test, temporarily modified to do several 
million trials and it passed. I also ran TestChecksumFileSystem and it passed. 
I re-ran the performance test and made sure it was consistent with the results 
we've been seeing all along.

After this is committed, there will be a small patch to HDFS and a small patch 
to MapReduce to sub out PureJavaCrc32 for java.util.zip.CRC32.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-6148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6148
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: benchmarks20090714.txt, benchmarks20090715.txt, 
> crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt, 
> hadoop-5598.txt, hadoop-5598.txt, hadoop-6148.txt, hdfs-297.txt, 
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java, 
> PureJavaCrc32.java, PureJavaCrc32New.java, PureJavaCrc32NewInner.java, 
> PureJavaCrc32NewLoop.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to