[
https://issues.apache.org/jira/browse/HBASE-11927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14541293#comment-14541293
]
stack commented on HBASE-11927:
-------------------------------
So, on machine w/ hardware support, we spend 20% less CPU. Nice one [~appy].
Minor. I don't think we want to do this in HConstants.
949 public static ChecksumType DEFAULT_CHECKSUM_TYPE =
ChecksumType.CRC32C;
HConstants is a bit of an anti-pattern. It should have defines that are truly
global. Better to keep constants with the code they are related to. Maybe in
ChecksumType? (I suppose we need ChecksumType? We can't use hadoop's
DataChecksum.Type? We'd break too much? Could maybe do in followup patch).
Nice test.
And to be clear, if an hfile is written with CRC32, we'll just read it out of
the hfile and use that verifying.... so making the change to new checksum type
should only apply to new files written? At least that is how I read it.
If good, lets get this in. On commit I'll add note to refguide unless you want
too to make sure the native libs are available and that for sure they are
working for you into perf section. We have this
http://hbase.apache.org/book.html#hadoop.native.lib but we could do better I'd
say if its 20% or more.
> Use Native Hadoop Library for HFile checksum (And flip default from CRC32 to
> CRC32C)
> ------------------------------------------------------------------------------------
>
> Key: HBASE-11927
> URL: https://issues.apache.org/jira/browse/HBASE-11927
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: Apekshit Sharma
> Attachments: HBASE-11927-v1.patch, HBASE-11927-v2.patch,
> HBASE-11927-v4.patch, HBASE-11927.patch, after-compact-2%.svg,
> after-randomWrite1M-0.5%.svg, before-compact-22%.svg,
> before-randomWrite1M-5%.svg, c2021.crc2.svg, c2021.write.2.svg,
> c2021.zip.svg, crc32ct.svg
>
>
> Up in hadoop they have this change. Let me publish some graphs to show that
> it makes a difference (CRC is a massive amount of our CPU usage in my
> profiling of an upload because of compacting, flushing, etc.). We should
> also make use of native CRCings -- especially the 2.6 HDFS-6865 and ilk -- in
> hbase but that is another issue for now.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)