[
https://issues.apache.org/jira/browse/HBASE-27730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801950#comment-17801950
]
Becker Ewing commented on HBASE-27730:
--------------------------------------
Yeah what I have is pretty similar. I just attached a patch file
([^HBASE-27730-prelim.patch]) which has everything. The ByteBuffBenchmark class
has proven to be pretty useless so far (no real meaningful numbers), but the
readVLong performance is as I described. It's a subset of the benchmarks in
HBASE-28256, so I used the results in
https://issues.apache.org/jira/browse/HBASE-28256?focusedCommentId=17799506&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17799506
as my point of reference.
I attached some comments in RefCnt.java that let you toggle between the "v1"
and "v2". I'll also note that the implementations in "v1" and "v2" are also
somewhat flawed as it's possible to call Recycler#free twice (to fix the bug,
you'd need to switch to using an AtomicReference<Recycler> so you could use the
"compareAndSet" method when setting "recycler" to null
> Optimize reference counting in off-heap ByteBuff
> ------------------------------------------------
>
> Key: HBASE-27730
> URL: https://issues.apache.org/jira/browse/HBASE-27730
> Project: HBase
> Issue Type: Improvement
> Reporter: Bryan Beaudreault
> Assignee: Becker Ewing
> Priority: Major
> Attachments: HBASE-27730-prelim.patch
>
>
> In HBASE-27710 we uncovered a performance regression in reference counting of
> ByteBuff. This was especially prominent in on-heap buffers when doing a
> simple HFile.Reader iteration of a file. For that case, we saw a 4x
> regression when reference counting was in play.
> It stands to reason that this same regression exists in off-heap buffers, and
> I've run a microbenchmark which indeed shows the same issue. With existing
> reference counting, scanning a 20gb hfile takes 40s. With an optimized
> version, scanning the same file takes 20s. We don't typically see this in
> profiling live regionservers where so much else goes on, but optimizing this
> would eliminate some cpu cycles.
> It's worth noting that netty saw this same regression a few years ago:
> [https://github.com/netty/netty/pull/8895]. Hat tip to [~zhangduo] for
> pointing this out.
> One of the fixes there was to copy some internal code from deeper in the ref
> counting, so that the call stack was smaller and inlining was possible. We
> can't really do that.
> Another thing they did was add a boolean field in their CompositeByteBuffer,
> which gets set to true when the buffer is recycled. So they don't need to do
> reference counting on every operation, instead they can just check a boolean.
> I tried adding a boolean to our RefCnt.java, and it indeed fixes the
> regression. The problem is, due to class alignment issues in java, adding
> this boolean field increases the heap size of RefCnt from 24 to 32 bytes.
> This seems non-trivial given it's used in bucket cache where there could be
> many millions of them.
> I think we can get around this by simply nulling out the recycler in RefCnt
> after it has been called. Then, instead of doing a boolean check we can do a
> null check. This performs similarly to the boolean, but without any extra
> memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)