[ 
https://issues.apache.org/jira/browse/CASSANDRA-20428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17935055#comment-17935055
 ] 

Blake Eggleston commented on CASSANDRA-20428:
---------------------------------------------

I’ll be interested to see how it goes for you. I’m still skeptical that arenas 
will be the right way to go because they seem like a really coarse way of 
dealing with allocations, though I’m sure there are some workarounds. 

I think a cursor or flyweight approach is the right way to go and it would have 
2 advantages:

1) it would allocate no more than the bare minimum of on/off heap raw bytes 
(array or otherwise)

2) it would eliminate allocation of all the other java objects that are 
allocated for reads (iterators, rows, etc) so it would improve the use cases 
using both large and small values. In many use cases, this is what creates most 
of the gc pressure.

The cursor/flyweight approach would be about as difficult to implement as an 
arena based approach. As you know I have a partially completed allocation free 
reader, and I can tell you that the real difficulty with eliminating that byte 
array allocation isn’t so much the reworking of that core read path loop, it’s 
dealing with the myriad of places where C* casually leaks objects from the read 
path into various other systems.

Since the read path (and the places that touch them internally) was written 
with the assumption that the jvm will manage the lifetime of all the objects 
they create, there has been no effort made to manage the ownership of objects 
coming out of the read path. You allocate an array for the contents of a cell 
and it will probably get thrown away immediately, but it might also live for 
the next hour and there’s no way to determine that at the time of allocation. 
Objects also leak out of the core read loop progressively, so it’s not like you 
can just make copies onto the heap for exceptions because you’d be copying a 
significant portion of what you’re reading, canceling out most of the benefit 
of your work.

I think any effort to make an allocation free read & compaction path (or even 
one that doesn’t allocate byte arrays) probably needs to start with creating 
and enforcing a stricter barrier between the internals of the read path and the 
rest of the database.

> Eliminate byte array allocation in ByteArrayAccessor.read
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-20428
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20428
>             Project: Apache Cassandra
>          Issue Type: Improvement
>            Reporter: Jon Haddad
>            Priority: Normal
>         Attachments: allocation-reverse.html, 
> image-2025-03-11-11-05-55-378.png
>
>
> During compaction we allocate a new byte[] in ByteArrayAccessor.read.  This 
> is one of the hottest paths in the codebase, hit during writes, compaction, 
> creating tables, and possibly others.  In my performance tests using default 
> compaction settings of 64MB I see this responsible for 40% of allocations.  
> This is largely what drives GC pause frequency and duration.  If we are able 
> to eliminate the O(N) allocations performed here, this might be one of the 
> best optimizations we could do for the number of things it touches.
> Allocation profile attached.
> !image-2025-03-11-11-05-55-378.png|width=514,height=269!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to