[
https://issues.apache.org/jira/browse/CASSANDRA-20428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935055#comment-17935055
]
Blake Eggleston commented on CASSANDRA-20428:
---------------------------------------------
I’ll be interested to see how it goes for you. I’m still skeptical that arenas
will be the right way to go because they seem like a really coarse way of
dealing with allocations, though I’m sure there are some workarounds.
I think a cursor or flyweight approach is the right way to go and it would have
2 advantages:
1) it would allocate no more than the bare minimum of on/off heap raw bytes
(array or otherwise)
2) it would eliminate allocation of all the other java objects that are
allocated for reads (iterators, rows, etc) so it would improve the use cases
using both large and small values. In many use cases, this is what creates most
of the gc pressure.
The cursor/flyweight approach would be about as difficult to implement as an
arena based approach. As you know I have a partially completed allocation free
reader, and I can tell you that the real difficulty with eliminating that byte
array allocation isn’t so much the reworking of that core read path loop, it’s
dealing with the myriad of places where C* casually leaks objects from the read
path into various other systems.
Since the read path (and the places that touch them internally) was written
with the assumption that the jvm will manage the lifetime of all the objects
they create, there has been no effort made to manage the ownership of objects
coming out of the read path. You allocate an array for the contents of a cell
and it will probably get thrown away immediately, but it might also live for
the next hour and there’s no way to determine that at the time of allocation.
Objects also leak out of the core read loop progressively, so it’s not like you
can just make copies onto the heap for exceptions because you’d be copying a
significant portion of what you’re reading, canceling out most of the benefit
of your work.
I think any effort to make an allocation free read & compaction path (or even
one that doesn’t allocate byte arrays) probably needs to start with creating
and enforcing a stricter barrier between the internals of the read path and the
rest of the database.
> Eliminate byte array allocation in ByteArrayAccessor.read
> ---------------------------------------------------------
>
> Key: CASSANDRA-20428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20428
> Project: Apache Cassandra
> Issue Type: Improvement
> Reporter: Jon Haddad
> Priority: Normal
> Attachments: allocation-reverse.html,
> image-2025-03-11-11-05-55-378.png
>
>
> During compaction we allocate a new byte[] in ByteArrayAccessor.read. This
> is one of the hottest paths in the codebase, hit during writes, compaction,
> creating tables, and possibly others. In my performance tests using default
> compaction settings of 64MB I see this responsible for 40% of allocations.
> This is largely what drives GC pause frequency and duration. If we are able
> to eliminate the O(N) allocations performed here, this might be one of the
> best optimizations we could do for the number of things it touches.
> Allocation profile attached.
> !image-2025-03-11-11-05-55-378.png|width=514,height=269!
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]