[ 
https://issues.apache.org/jira/browse/HADOOP-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz Wo (Nicholas), SZE updated HADOOP-6148:
-------------------------------------------

    Attachment: benchmarks20090715.txt

Changed the benchmark as below such that there are ~4GB data in each run.
{code}
    for(int j = 10; j < 24; j += 2) {
      for(int k = 0; k < 4; k++) {
        final int bytelen = (1 << j) + k;
        final byte[] b = new byte[bytelen];
        final int n = (int)((1L << 32) / bytelen);
    
        ran.nextBytes(b);
        t.tick("ran.nextBytes, bytelen=" + bytelen);
    
        final SortedMap<Long, Checksum> rank = new TreeMap<Long, Checksum>(); 
        test(pure, b, n, t, rank);
        test(test, b, n, t, rank);
        test(zip, b, n, t, rank);

        System.out.println("rank = " + rank);
        final Checksum c = rank.entrySet().iterator().next().getValue();
        fastest.put(c, fastest.get(c) + 1);
      }
    }
{code}

benchmarks20090715.txt: new results

It is consistent that TestCrc32 is faster than zip.CRC32, which is faster than 
PureJavaCrc32.  There are ~13% improvement by TestCrc32 over PureJavaCrc32.

> Implement a pure Java CRC32 calculator
> --------------------------------------
>
>                 Key: HADOOP-6148
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6148
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Owen O'Malley
>            Assignee: Todd Lipcon
>         Attachments: benchmarks20090714.txt, benchmarks20090715.txt, 
> crc32-results.txt, hadoop-5598-evil.txt, hadoop-5598-hybrid.txt, 
> hadoop-5598.txt, hadoop-5598.txt, hdfs-297.txt, PureJavaCrc32.java, 
> PureJavaCrc32.java, PureJavaCrc32.java, PureJavaCrc32.java, 
> TestCrc32Performance.java, TestCrc32Performance.java, 
> TestCrc32Performance.java, TestPureJavaCrc32.java
>
>
> We've seen a reducer writing 200MB to HDFS with replication = 1 spending a 
> long time in crc calculation. In particular, it was spending 5 seconds in crc 
> calculation out of a total of 6 for the write. I suspect that it is the 
> java-jni border that is causing us grief.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to