[
https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15964156#comment-15964156
]
Vikas Vishwakarma commented on HBASE-17877:
-------------------------------------------
I completed few iterations today, I added the guava benchmark also which is the
latest version, with the following changes
1. Have all the comparable benchmarks in JMH as suggested by [~stack]
2. Added BlackHole.consume for all the benchmark results as suggested by
[~Apache9] (thanks again !)
3. Used a slightly optimized random byte generation to make sure it's impact is
less on the benchmarks by using a smaller byte array for random byte selection
and replacement in input arrays
4. Added the guava benchmarks (master branch) as suggested by [~larsh] above
Observations
While hadoop version was giving better performance, the performance was ~10%
lower when byteArrayLength%8 is not 0, most likely because of the last loop
where it iterates over each of the leftover bytes. This also could be some
compiler optimization. If I switch the leftover byte handling in hadoop
comparator similar to HBase, I get exactly inverse result i.e worse when
byteArraySize%8 = 0 and better when byteArraySize%8 != 0
However with guava version I am able to see overall better performance compared
to both HBase as well as Hadoop. I had tried the guava version earlier also
using native timestamp before/after kind of measurements and in that case
hadoop comparator was giving better results in some cases. It could again be
statistical variations or related to compiler optimizations, etc. With JMH
framework after fixing all the initial issues related to compiler/JIT
optimization like input byte array randomization, adding BlackHole, etc I am
seeing consistently better benchmarks for the guava version for all array sizes
including the one's where byteArraySize%8 is zero or non-zero
Looks like the main branch guava version is best performing and replacing that
in HBase should give maximum gains (pending review)
https://github.com/google/guava/blob/master/guava/src/com/google/common/primitives/UnsignedBytes.java#L362
Results:
|---|HBase|---|---|hadoop|---|---|hadoop %diff|---|---|guava|---|---|guava
%diff|---|---|
|byte array
size|min|mean|max|min|mean|max|min|mean|max|min|mean|max|min|mean|max|
|4|19814.642|20217.647|20250.91|19838.782|20072.437|20090.503|0|-1|-1|24026.12|24284.021|24300.338|21|20|20|
|8|19846.598|19874.477|19881.019|22012.932|22044.713|22051.793|11|11|11|22199.453|22253.173|22261.712|12|12|12|
|16|19400.623|19430.837|19438.378|19606.912|19616.322|19649.318|1|1|1|21995.475|22113.443|22120.836|13|14|14|
|20|18456.241|18493.416|18500.289|16482.859|16705.744|16776.35|-11|-10|-9|18625.111|18660.355|18704.285|1|1|1|
|32|18953.196|18984.412|18992.993|19307.22|19345.122|19352.411|2|2|2|21309.337|21359.051|21377.868|12|13|13|
|50|17444.431|17506.91|17518.791|15864.759|15941.543|15953.412|-9|-9|-9|18468.621|18613.202|18749.651|6|6|7|
|64|17390.097|18046.898|18143.835|20152.624|20379.32|20397.359|16|13|12|21065.113|21116.799|21128.523|21|17|16|
|100|14844.718|14866.353|14889.49|13293.668|13385.7|13403.439|-10|-10|-10|15594.286|15690.369|15796.081|5|6|6|
|128|14183.991|14329.948|14351.016|17016.59|17260.48|17278.799|20|20|20|17668.509|19205.199|19333.922|25|34|35|
|200|11665.597|11732.09|11748.27|11599.469|11733.228|11755.622|-1|0|0|14540.79|14648.077|14728.363|25|25|25|
|256|10404.438|10438.019|10444.734|13205.591|13315.903|13326.772|27|28|28|14448.858|14933.008|15064.242|39|43|44|
|512|6405.106|6592.613|6604.371|9031.652|9142.564|9149.54|41|39|39|10236.501|10376.17|10389.971|60|57|57|
|1024|3812.341|3832.237|3840.291|3863.105|3864.757|3871.94|1|1|1|6911.951|7002.067|7009.792|81|83|83|
|2048|2052.148|2060.585|2061.935|2129.32|2151.807|2155.381|4|4|5|4072.481|4085.278|4089.185|98|98|98|
|4096|1073.263|1089.947|1091.566|1069.962|1076.303|1076.993|0|-1|-1|2319.74|2326.514|2328.69|116|113|113|
|8192|544.723|547.063|547.449|863.716|866.808|867.296|59|58|58|931.945|1131.406|1136.288|71|107|108|
|16384|275.155|275.724|275.909|432.556|434.158|434.698|57|57|58|582.37|584.294|584.852|112|112|112|
Apologies for the multiple iterations, I am myself figuring out a lot while
doing these microbenchmark iterations and there are multiple dimensions to
track in the test at different levels
> Replace/improve HBase's byte[] comparator
> -----------------------------------------
>
> Key: HBASE-17877
> URL: https://issues.apache.org/jira/browse/HBASE-17877
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Vikas Vishwakarma
> Attachments: 17877-1.2.patch, 17877-v2-1.3.patch,
> ByteComparatorJiraHBASE-17877.pdf
>
>
> [~vik.karma] did some extensive tests and found that Hadoop's version is
> faster - dramatically faster in some cases.
> Patch forthcoming.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)