[ 
https://issues.apache.org/jira/browse/CASSANDRA-20176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17909159#comment-17909159
 ] 

Dmitry Konstantinov commented on CASSANDRA-20176:
-------------------------------------------------

Hi [~benedict], thank you for taking a look and sharing your thoughts. Yes, I 
agree that such revise is a much better option, I have this question in my mind 
too for quite a long time. The main problem with it - I afraid it can be a very 
time consuming story with an unclear final :). At least I have got such 
impression after analyzing the tickets like these some time ago:
https://issues.apache.org/jira/browse/CASSANDRA-10989
https://issues.apache.org/jira/browse/CASSANDRA-4718
https://issues.apache.org/jira/browse/CASSANDRA-16499
https://issues.apache.org/jira/browse/CASSANDRA-1632

So, I can help here but it looks like we need to define more clear goals and 
first steps to start with.

In this direction I think virtual threads can be a good alternative in long 
term, using of virtual threads can help to avoid blocking of native transport 
request threads with coordination results awaiting. And the code should be much 
easier compared for example to a reactive approach, which one based on my 
expirience is a nightmare from a troubleshooting point of view...
The problems with virtual threads are:
 - they are available only in latest Java versions while Cassandra is quite 
conservative in adopting of new versions..
 - they are not very friendly with mutable thread locals which we use quite 
widely in Cassandra code now

 

Returning back to the original story/idea, in production people usually try to 
not saturate their system under normal workload, so some amount of spinning I 
suppose still have place. While it is not an issue from CPU consumption point 
of view, an extra memory consumption still may have an impact on GC/CPU caches 
efficiency.

8.5% of allocations does not look big but the issue is that overall memory 
allocation distributed across many such places, each one eats a bit but in sum 
such overheads are not less or even more than main allocations directly related 
to an actual data processing.

So, my idea was: if there is a cheap enough way to make a local change and get 
a profit without massive re-design/re-testing - it would be beneficial. If it 
is not an option - then agree, it is not worth it. I will try to think a bit 
about it in background and if I find a simple way to do it - I will share it to 
review.

A short version:
 - for the current story I will try think a bit in background to find a simple 
local change without spending a lot of efforts
 - regarding the overall executor revising I am ready to help with it but I 
think we need a plan where to start with

> Reduce memory allocation in SEP Worker spin wait logic
> ------------------------------------------------------
>
>                 Key: CASSANDRA-20176
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20176
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Local/Other
>            Reporter: Dmitry Konstantinov
>            Assignee: Dmitry Konstantinov
>            Priority: Normal
>         Attachments: image-2025-01-01-13-14-02-562.png, 
> image-2025-01-01-13-15-16-767.png
>
>
> There is a quite massive memory allocation within spin waiting logic in SEP 
> Executor: org.apache.cassandra.concurrent.SEPWorker#doWaitSpin for some 
> workloads. For example it is observed for a writing test described in 
> CASSANDRA-20165 where ~8.5% of total allocations are from this logic:
> !image-2025-01-01-13-14-02-562.png|width=570!
> !image-2025-01-01-13-15-16-767.png|width=570!
> The idea of this parking is to avoid unpark signalling costs. The logic 
> selects a random time period to park a thread by LockSupport.parkNanos and 
> put the thread into a ConcurrentSkipListMap using wake up time as a key, so 
> the map is used as a concurrent priority queue. Once the parking is finished 
> - the thread removes itself from the map. When we neede to schedule a task - 
> we take a spinning thread with the smallest wake up time from the map.
> We can try to implement another algorithm for this logic without memory 
> allocation overheads, for example based on a Timing Wheel data structure.
> Note: it also makes sense to check granularity of actual parking time 
> (https://hazelcast.com/blog/locksupport-parknanos-under-the-hood-and-the-curious-case-of-parking/)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to