[ 
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540905#comment-14540905
 ] 

Apekshit Sharma commented on HBASE-11927:
-----------------------------------------

There were a couple of options. NHL(native hadoop library) and 
[Circe|https://github.com/trevorr/circe]
We decided to go with NHL, despite the fact that it introduces dependency on 
hadoop, because hfile checksum requires interface which take two streams, data 
and checksums, and verifies/calculates checksums for chunks of a fixed size 
data. NHL already supports it while Circe doesn't. (More differences in this 
[doc|https://docs.google.com/document/d/1NCB3h8YU86mGFjK_uWA7KMDmu288nrCZvwRTr30zX-s/edit]

We switched from CRC32 as default to CRC32C because:
- crc32c has better error detection properties
- crc32c has advantage of dedicated instruction on newer Intel processors
(couldn't profile this case because the machines i used for testing weren't new 
enough, ie didn't support [sse4.2|http://en.wikipedia.org/wiki/SSE4#SSE4.2] 
instructions)

Profiling was done using lightweight-java-profiler.




> Use Native Hadoop Library for HFile checksum
> --------------------------------------------
>
>                 Key: HBASE-11927
>                 URL: https://issues.apache.org/jira/browse/HBASE-11927
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Apekshit Sharma
>         Attachments: HBASE-11927-v1.patch, HBASE-11927.patch, c2021.crc2.svg, 
> c2021.write.2.svg, c2021.zip.svg, compact-with-native.svg, 
> compact-without-native.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that 
> it makes a difference (CRC is a massive amount of our CPU usage in my 
> profiling of an upload because of compacting, flushing, etc.).  We should 
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in 
> hbase but that is another issue for now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to