[ https://issues.apache.org/jira/browse/HADOOP-5598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Scott Carey updated HADOOP-5598: -------------------------------- Attachment: PureJavaCrc32.java This version does some algebraic manipulation and gets to about 10% faster than the native implementation on large blocks on my machine (Java 1.6, mac osx, 64 bit, 2.5Ghz Core2 duo). pure java 16MB block: 397.516 MB/sec sun native 16MB block: 337.731 MB/sec This version uses the same lookup table as the previous, occupying 1KB. I have another pure java version that uses four lookup tables (4KB) that I will be posting shortly after I clean it up. Its results for large blocks are: pure java 16MB block: 624.390 MB/sec sun native 16MB block: 342.246 MB/sec it first breaks 600MB/sec at a block size of 128 bytes and is over 520MB/sec at a block size of 32 bytes. A big remaining question is performance under concurrency. The larger lookup table footprint may bring this version down a little. Any version calling out to native code may also slow under concurrency. > Implement a pure Java CRC32 calculator > -------------------------------------- > > Key: HADOOP-5598 > URL: https://issues.apache.org/jira/browse/HADOOP-5598 > Project: Hadoop Core > Issue Type: Improvement > Components: dfs > Reporter: Owen O'Malley > Assignee: Todd Lipcon > Attachments: crc32-results.txt, hadoop-5598-evil.txt, > hadoop-5598-hybrid.txt, hadoop-5598.txt, hadoop-5598.txt, PureJavaCrc32.java, > PureJavaCrc32.java, TestCrc32Performance.java, TestCrc32Performance.java > > > We've seen a reducer writing 200MB to HDFS with replication = 1 spending a > long time in crc calculation. In particular, it was spending 5 seconds in crc > calculation out of a total of 6 for the write. I suspect that it is the > java-jni border that is causing us grief. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.