[ 
https://issues.apache.org/jira/browse/HBASE-27730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801953#comment-17801953
 ] 

Becker Ewing commented on HBASE-27730:
--------------------------------------

{noformat}
Benchmark              (shouldUseNoneRecycler)  (useOffHeapBuffer)  Mode  Cnt   
  Score     Error  Units
ByteBuffBenchmark.get                     true                true  avgt    5  
8715.026 ± 196.680  ns/op
ByteBuffBenchmark.get                     true               false  avgt    5  
9156.038 ± 879.701  ns/op
ByteBuffBenchmark.get                    false                true  avgt    5  
8746.112 ±  91.118  ns/op
ByteBuffBenchmark.get                    false               false  avgt    5  
9067.323 ± 190.359  ns/op


Benchmark                                         (shouldUseNoneRecycler)       
       (vlong)  Mode  Cnt   Score   Error  Units
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true       
             9  avgt    5   2.254 ± 0.077  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true       
           512  avgt    5   5.305 ± 0.169  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true       
         80000  avgt    5   7.014 ± 0.929  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true       
  548755813887  avgt    5   6.730 ± 0.037  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true       
 1700104028981  avgt    5   7.497 ± 0.133  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                     true  
9123372036854775807  avgt    5  10.984 ± 0.445  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false       
             9  avgt    5   2.242 ± 0.031  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false       
           512  avgt    5   5.349 ± 0.075  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false       
         80000  avgt    5   6.677 ± 0.041  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false       
  548755813887  avgt    5   6.772 ± 0.528  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false       
 1700104028981  avgt    5   7.436 ± 0.128  ns/op
ReadVLongBenchmark.readVLongHBase14186_OffHeapBB                    false  
9123372036854775807  avgt    5  11.173 ± 0.217  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true       
             9  avgt    5   2.495 ± 0.268  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true       
           512  avgt    5   5.335 ± 0.166  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true       
         80000  avgt    5   7.014 ± 0.177  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true       
  548755813887  avgt    5   6.990 ± 0.083  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true       
 1700104028981  avgt    5   7.085 ± 0.093  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                      true  
9123372036854775807  avgt    5  10.319 ± 0.229  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false       
             9  avgt    5   2.420 ± 0.126  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false       
           512  avgt    5   5.508 ± 0.500  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false       
         80000  avgt    5   7.137 ± 0.108  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false       
  548755813887  avgt    5   6.943 ± 0.040  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false       
 1700104028981  avgt    5   6.921 ± 0.734  ns/op
ReadVLongBenchmark.readVLongHBase14186_OnHeapBB                     false  
9123372036854775807  avgt    5  10.250 ± 0.364  ns/op{noformat}

> Optimize reference counting in off-heap ByteBuff
> ------------------------------------------------
>
>                 Key: HBASE-27730
>                 URL: https://issues.apache.org/jira/browse/HBASE-27730
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Becker Ewing
>            Priority: Major
>         Attachments: HBASE-27730-prelim.patch
>
>
> In HBASE-27710 we uncovered a performance regression in reference counting of 
> ByteBuff. This was especially prominent in on-heap buffers when doing a 
> simple HFile.Reader iteration of a file. For that case, we saw a 4x 
> regression when reference counting was in play.
> It stands to reason that this same regression exists in off-heap buffers, and 
> I've run a microbenchmark which indeed shows the same issue. With existing 
> reference counting, scanning a 20gb hfile takes 40s. With an optimized 
> version, scanning the same file takes 20s. We don't typically see this in 
> profiling live regionservers where so much else goes on, but optimizing this 
> would eliminate some cpu cycles.
> It's worth noting that netty saw this same regression a few years ago: 
> [https://github.com/netty/netty/pull/8895]. Hat tip to [~zhangduo] for 
> pointing this out.
> One of the fixes there was to copy some internal code from deeper in the ref 
> counting, so that the call stack was smaller and inlining was possible. We 
> can't really do that.
> Another thing they did was add a boolean field in their CompositeByteBuffer, 
> which gets set to true when the buffer is recycled. So they don't need to do 
> reference counting on every operation, instead they can just check a boolean.
> I tried adding a boolean to our RefCnt.java, and it indeed fixes the 
> regression. The problem is, due to class alignment issues in java, adding 
> this boolean field increases the heap size of RefCnt from 24 to 32 bytes. 
> This seems non-trivial given it's used in bucket cache where there could be 
> many millions of them.
> I think we can get around this by simply nulling out the recycler in RefCnt 
> after it has been called. Then, instead of doing a boolean check we can do a 
> null check. This performs similarly to the boolean, but without any extra 
> memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to