[
https://issues.apache.org/jira/browse/HADOOP-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-6148:
--------------------------------
Attachment: hadoop-6148.txt
Attached is an updated patch. The changes here from the previous version:
- Replaced the implementation with the NewInner version by Scott.
- Cleaned up the Performance Test a bit and included it as a static inner class
of TestPureJavaCrc32 - it can be run using java
'org.apache.hadoop.util.TestPureJavaCrc32$PerformanceTest'
- The patch also includes changes to ChecksumFileSystem and util.DataChecksum
Note that I did not move the test out of src/test/core - it looks like
src/test/core still exists in the common repository - all of the tests for
common are still in there.
As for testing, I reran the unit test, temporarily modified to do several
million trials and it passed. I also ran TestChecksumFileSystem and it passed.
I re-ran the performance test and made sure it was consistent with the results
we've been seeing all along.
After this is committed, there will be a small patch to HDFS and a small patch
to MapReduce to sub out PureJavaCrc32 for java.util.zip.CRC32.
> Implement a pure Java CRC32 calculator
> --------------------------------------
>
> Key: HADOOP-6148
> URL: https://issues.apache.org/jira/browse/HADOOP-6148
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Owen O'Malley
> Assignee: Todd Lipcon
> Attachments: benchmarks20090714.txt, benchmarks20090715.txt,
> crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt,
> hadoop-5598.txt, hadoop-5598.txt, hadoop-6148.txt, hdfs-297.txt,
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java,
> PureJavaCrc32.java, PureJavaCrc32New.java, PureJavaCrc32NewInner.java,
> PureJavaCrc32NewLoop.java, TestCrc32Performance.java,
> TestCrc32Performance.java, TestCrc32Performance.java,
> TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a
> long time in crc calculation. In particular, it was spending 5 seconds in crc
> calculation out of a total of 6 for the write. I suspect that it is the
> java-jni border that is causing us grief.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.