[jira] [Commented] (HBASE-27730) Optimize reference counting in off-heap ByteBuff

Becker Ewing (Jira) Tue, 02 Jan 2024 14:30:22 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-27730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801944#comment-17801944
 ]


Becker Ewing commented on HBASE-27730:
--------------------------------------

[~zhangduo] / [~bbeaudreault] any chance that y'all have the microbenchmarks 
laying around for the initial work here for determining that the null-check 
solution is equally as fast to copying the AbstractReferenceCounted code? I've 
got some microbenchmarks that I wrote for this that are giving me very little 
signal (I fear that the java compiler is doing something sneaky). I'm able to 
get a lot of mileage out of my microbenchmark I wrote for readVLong for 
HBASE-28256 and using the numbers there as a control, but it's generally 
showing that implementing this with the null-check route either results in:
 * a regression to the NONE recycler case when implemented as suggested above 
(and not much perf improvement to performance). Going this route looks like it 
essentially nullifies the performance gains we made in HBASE-27710 for no real 
benefit. I believe this is because HBASE-27710 essentially made it so that when 
using a NONE recycler, a volatile variable read isn't needed. Using the above 
suggested implementation (i.e. the null check solution) introduces a volatile 
variable read into the read hot path
 * if you instead use the "leak" field being equal to null as a proxy for 
determining whether the NONE recycler is being used (so a volatile variable 
read isn't needed for the NONE recycler case), then the NONE recycler 
regression is fixed, but there's no real performance gain 

 

I can attach my benchmarking code/example patches if it would help, but I think 
given this I'm going to try going down the route of copying the 
AbstractReferenceCounted code and see if that yields any performance improvement

> Optimize reference counting in off-heap ByteBuff
> ------------------------------------------------
>
>                 Key: HBASE-27730
>                 URL: https://issues.apache.org/jira/browse/HBASE-27730
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Bryan Beaudreault
>            Assignee: Becker Ewing
>            Priority: Major
>
> In HBASE-27710 we uncovered a performance regression in reference counting of 
> ByteBuff. This was especially prominent in on-heap buffers when doing a 
> simple HFile.Reader iteration of a file. For that case, we saw a 4x 
> regression when reference counting was in play.
> It stands to reason that this same regression exists in off-heap buffers, and 
> I've run a microbenchmark which indeed shows the same issue. With existing 
> reference counting, scanning a 20gb hfile takes 40s. With an optimized 
> version, scanning the same file takes 20s. We don't typically see this in 
> profiling live regionservers where so much else goes on, but optimizing this 
> would eliminate some cpu cycles.
> It's worth noting that netty saw this same regression a few years ago: 
> [https://github.com/netty/netty/pull/8895]. Hat tip to [~zhangduo] for 
> pointing this out.
> One of the fixes there was to copy some internal code from deeper in the ref 
> counting, so that the call stack was smaller and inlining was possible. We 
> can't really do that.
> Another thing they did was add a boolean field in their CompositeByteBuffer, 
> which gets set to true when the buffer is recycled. So they don't need to do 
> reference counting on every operation, instead they can just check a boolean.
> I tried adding a boolean to our RefCnt.java, and it indeed fixes the 
> regression. The problem is, due to class alignment issues in java, adding 
> this boolean field increases the heap size of RefCnt from 24 to 32 bytes. 
> This seems non-trivial given it's used in bucket cache where there could be 
> many millions of them.
> I think we can get around this by simply nulling out the recycler in RefCnt 
> after it has been called. Then, instead of doing a boolean check we can do a 
> null check. This performs similarly to the boolean, but without any extra 
> memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-27730) Optimize reference counting in off-heap ByteBuff

Reply via email to