[
https://issues.apache.org/jira/browse/HADOOP-7445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Todd Lipcon updated HADOOP-7445:
--------------------------------
Attachment: hadoop-7445.txt
Good point that we don't need the special license on the tables, since we
generated them using your Table class. But, the actual "slicing-by-8"
implementation is from a project with BSD license. So, I moved that special
license header to bulk_crc32.c.
This new revision also rebases on the mavenized common.
As for testing performance and correctness against the existing implementation:
- Performance wise, we don't currently have a canned benchmark for testing
performance of checksum _verification_. This patch doesn't currently add native
checksum _computation_ anywhere, since the umbrella JIRA HDFS-2080 is focusing
on the read path. I was able to run benchmarks of "hadoop fs -cat /dev/shm/128M
/dev/shm/128M /dev/shm/128M [repeated 50 times]" using a ChecksumFileSystem,
and saw ~60% speed improvement. This is a measurement of CPU overhead, since
it's reading from a file in a RAM disk.
- Correctness wise, the new test cases in TestDataChecksum verify both the
native and non-native code, since they test with direct buffers as well as heap
buffers that wrap a byte[]. If the native and non-native code disagreed, then
this test would fail for one of the two cases (since the computed checksums are
always computed by the java code)
> Implement bulk checksum verification using efficient native code
> ----------------------------------------------------------------
>
> Key: HADOOP-7445
> URL: https://issues.apache.org/jira/browse/HADOOP-7445
> Project: Hadoop Common
> Issue Type: Improvement
> Components: native, util
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
> Attachments: hadoop-7445.txt, hadoop-7445.txt, hadoop-7445.txt,
> hadoop-7445.txt, hadoop-7445.txt
>
>
> Once HADOOP-7444 is implemented ("bulk" API for checksums), good performance
> gains can be had by implementing bulk checksum operations using JNI. This
> JIRA is to add checksum support to the native libraries. Of course if native
> libs are not available, it will still fall back to the pure-Java
> implementations.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira