[ 
https://issues.apache.org/jira/browse/HADOOP-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HADOOP-7445:
--------------------------------

    Attachment: hadoop-7445.txt

Good point that we don't need the special license on the tables, since we 
generated them using your Table class. But, the actual "slicing-by-8" 
implementation is from a project with BSD license. So, I moved that special 
license header to bulk_crc32.c.

This new revision also rebases on the mavenized common.

As for testing performance and correctness against the existing implementation:
- Performance wise, we don't currently have a canned benchmark for testing 
performance of checksum _verification_. This patch doesn't currently add native 
checksum _computation_ anywhere, since the umbrella JIRA HDFS-2080 is focusing 
on the read path. I was able to run benchmarks of "hadoop fs -cat /dev/shm/128M 
/dev/shm/128M /dev/shm/128M [repeated 50 times]" using a ChecksumFileSystem, 
and saw ~60% speed improvement. This is a measurement of CPU overhead, since 
it's reading from a file in  a RAM disk.
- Correctness wise, the new test cases in TestDataChecksum verify both the 
native and non-native code, since they test with direct buffers as well as heap 
buffers that wrap a byte[]. If the native and non-native code disagreed, then 
this test would fail for one of the two cases (since the computed checksums are 
always computed by the java code)

> Implement bulk checksum verification using efficient native code
> ----------------------------------------------------------------
>
>                 Key: HADOOP-7445
>                 URL: https://issues.apache.org/jira/browse/HADOOP-7445
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: native, util
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hadoop-7445.txt, hadoop-7445.txt, hadoop-7445.txt, 
> hadoop-7445.txt, hadoop-7445.txt
>
>
> Once HADOOP-7444 is implemented ("bulk" API for checksums), good performance 
> gains can be had by implementing bulk checksum operations using JNI. This 
> JIRA is to add checksum support to the native libraries. Of course if native 
> libs are not available, it will still fall back to the pure-Java 
> implementations.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to