[ 
https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15962461#comment-15962461
 ] 

Vikas Vishwakarma commented on HBASE-17877:
-------------------------------------------

[~larsh] here are the updated results. 

ok sorry .. figured out a issue with my JMH tests .. but after fixing it the 
results are very encouraging. Initially I was using static byte arrays for 
comparison which i guess was getting internally optimized. 

This was the old JMH benchmark code
{code:title=OldBenchmarkCode.java|borderStyle=solid}
        ba1_8 = new byte[8];
        ba2_8 = new byte[8];
        r.nextBytes(ba1_8);
        r.nextBytes(ba2_8);

        compareToHadoop(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
             //hadoop comparator code
        }

        compareToHBase(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
             //hbase comparator code
        }
{code}

To avoid any optimizations I changed it as follows by randomly changing one of 
the bytes in the byte arrays in the new JMH benchmark code
{code:title=NewBenchmarkCode.java|borderStyle=solid}
        ba1_8 = new byte[8];
        ba2_8 = new byte[8];
        r.nextBytes(ba1_8);
        r.nextBytes(ba2_8);

        compareToHadoop(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
                final int minLength = Math.min(length1, length2);
                int indx = r.nextInt(minLength);
                buffer1[indx] = (byte) 43;
                buffer2[indx] = (byte) 43;

               //hadoop comparator code
        }

        compareToHBase(ba1_8, 0, ba1_8.length, ba2_8, 0, ba2_8.length); {
                final int minLength = Math.min(length1, length2);
                int indx = r.nextInt(minLength);
                buffer1[indx] = (byte) 43;
                buffer2[indx] = (byte) 43;

             //hbase comparator code
        }
{code}

With the above changes I ran with 20 Warmup cycles and 100 iterations of 1 
second each for each array size (so the test duration per comparator is around 
30 mins) and now we can clearly see that as the byte array size increases the 
throughput in ops/ms reduces and the results are as follows where we see very 
good improvement with hadoop comparator vs HBase comparator (except for few 
cases). 
Iteration#1
|----|HBase|----|----|Hadoop|----|----|%diff|----|----|
|byte array diff index|min|mean|max|min|mean|max|min|mean|max|
|4|36948.957|37047.507|37063.599|43624.207|43720.104|43736.301|18|18|18|
|8|27884.837|34081.159|34173.034|39546.43|39653.132|39683.029|42|16|16|
|16|32994.729|33606.42|33643.392|38950.12|39033.963|39050.588|18|16|16|
|20|31131.95|31262.936|31427.434|27721.608|27900.273|27934.124|-11|-11|-11|
|32|31564.556|31713.3|31729.588|36641.596|36875.77|36908.993|16|16|16|
|50|25651.127|25704.675|25720.617|21985.286|22783.331|23810.156|-14|-11|-7|
|64|23990.409|25744.616|25817.746|22774.009|22907.009|23040.051|-5|-11|-11|
|100|19559.995|19733.446|19766.259|17116.267|18049.88|19421.504|-12|-9|-2|
|128|20541.274|20564.717|20571.537|27311.353|27444.572|27467.086|33|33|34|
|200|14356.162|14376.86|14384.074|17341.848|17946.231|18587.39|21|25|29|
|256|13319.756|13615.766|13648.414|18262.812|18328.989|18337.549|37|35|34|
|512|8022.747|8053.372|8057.757|12494.631|12560.197|12569.778|56|56|56|
|1024|4368.514|4387.346|4390.766|7049.335|7144.239|7152.564|61|63|63|
|2048|2312.296|2316.975|2318.876|3735.84|3746.904|3748.395|62|62|62|
|4096|963.396|1173.651|1177.635|1854.35|1992.96|1998.702|92|70|70|
|8192|557.483|568.487|568.982|1021.296|1028.422|1029.441|83|81|81|
|16384|270.662|300.638|301.418|512.884|515.227|515.692|89|71|71|

Iteration#2
|----|HBase|----|----|Hadoop|----|----|%diff|----|----|
|byte array diff index|min|mean|max|min|mean|max|min|mean|max|
|4|35456.243|37025.448|37064.285|43049.677|43680.577|43737.106|21|18|18|
|8|24971.846|33830.522|34169.057|38968.528|39633.455|39725.308|56|17|16|
|16|32733.421|32867.514|32887.865|38875.54|39031.123|39054.413|19|19|19|
|20|29543.281|31638.401|31887.656|27356.015|27902.406|27937.292|-7|-12|-12|
|32|31567.346|31707.575|31730.414|36795.993|36874.896|36905.325|17|16|16|
|50|25178.46|25716.801|25737.88|23123.396|23842.244|23954.188|-8|-7|-7|
|64|25232.908|25769.57|25790.104|23816.636|23926.496|23993.18|-6|-7|-7|
|100|18537.318|19401.317|19450.866|16679.562|17068.408|17110.839|-10|-12|-12|
|128|20516.499|20561.503|20570.657|25857.553|27023.148|27048.004|26|31|31|
|200|14368.006|14387.637|14399.758|16416.011|16620.799|16714.22|14|16|16|
|256|12164.544|13614.59|13646.761|19741.812|19873.067|19881.449|62|46|46|
|512|7949.774|7988.21|8000.377|12369.546|12555.549|12569.001|56|57|57|
|1024|4360.064|4388.154|4391.112|7058.351|7145.961|7152.739|62|63|63|
|2048|2282.581|2315.234|2318.591|3581.612|3741.999|3748.724|57|62|62|
|4096|1129.561|1141.206|1146.345|2013.874|2016.491|2017.712|78|77|76|
|8192|594.281|599.508|600.069|1022.809|1028.663|1029.265|72|72|72|
|16384|299.974|300.753|301.198|499.003|515.035|515.806|66|71|71|

> Replace/improve HBase's byte[] comparator
> -----------------------------------------
>
>                 Key: HBASE-17877
>                 URL: https://issues.apache.org/jira/browse/HBASE-17877
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Lars Hofhansl
>            Assignee: Vikas Vishwakarma
>         Attachments: 17877-1.2.patch, 17877-v2-1.3.patch, 
> ByteComparatorJiraHBASE-17877.pdf
>
>
> [~vik.karma] did some extensive tests and found that Hadoop's version is 
> faster - dramatically faster in some cases.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to