[ 
https://issues.apache.org/jira/browse/HBASE-14738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell updated HBASE-14738:
-----------------------------------
    Attachment: HBASE-14738-0.98.patch

Attaching first cut. Differences from original commit:
- Uses CompatibilityFactory and ServiceLoader to load hadoop version specific 
checksum implementations.
- Moved old checksum code out to the hadoop 1 compat module. Borrowed Hadoop 
code to implement APIs in the shape of DataChecksum APIs.
- Hadoop 2 compat module checksum implementation uses DataChecksum directly. 
Gets the benefits of acceleration where support is available.
- Caches checksum object instances to avoid CompatibilityFactory and 
ServiceLoader overheads. (Noticed this when reprofiling after changes.) Cache 
will drop references to checksum objects 10 minutes after last use.

Profiled loading with YCSB workload A. Observed expected improvements.

Let me profile the other workloads and run the full unit test suite and report 
back.

> Backport HBASE-11927 (Use Native Hadoop Library for HFile checksum) to 0.98
> ---------------------------------------------------------------------------
>
>                 Key: HBASE-14738
>                 URL: https://issues.apache.org/jira/browse/HBASE-14738
>             Project: HBase
>          Issue Type: Task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>             Fix For: 0.98.16
>
>         Attachments: HBASE-14738-0.98.patch
>
>
> Profiling 0.98.15 I see 20-30% of CPU time spent in Hadoop's PureJavaCrc32. 
> Not surprising given previous results described on HBASE-11927. Backport.
> There are two issues with the backport:
> # The patch on 11927 changes the default CRC type from CRC32 to CRC32C. 
> Although the changes are backwards compatible -files with either CRC type 
> will be handled correctly in a transparent manner - we should probably leave 
> the default alone in 0.98 and advise users on a site configuration change to 
> use CRC32C if desired, for potential hardware acceleration.
> # Need a shim for differences between Hadoop's DataChecksum type.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to