[ https://issues.apache.org/jira/browse/CASSANDRA-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17927106#comment-17927106 ]
Benedict Elliott Smith commented on CASSANDRA-20226: ---------------------------------------------------- It's interesting that 1/3rd of the time is spent in trySwapRegion. That suggests that we might like to have some thread begin allocating a new region before we reach the end. The problem there is having an efficient metric for deciding when this should happen, as we don't want to increase memory consumption in workloads with lots of infrequently updated memtables. Regarding the other 2/3rds, one option to consider is to estimate the space required and reserve it before executing, so that we have a thread-local limit that we expect to be sufficient. This should reduce the SubAllocator.acquired and SubPool.tryAllocate overheads. > Reduce contention in NativeAllocator.allocate > --------------------------------------------- > > Key: CASSANDRA-20226 > URL: https://issues.apache.org/jira/browse/CASSANDRA-20226 > Project: Apache Cassandra > Issue Type: Improvement > Components: Local/Memtable > Reporter: Dmitry Konstantinov > Assignee: Dmitry Konstantinov > Priority: Normal > Attachments: cpu_profile_batch.html, > image-2025-01-20-23-38-58-896.png, profile.yaml > > > For a high insert batch rate it looks like we have a bottleneck in > NativeAllocator.allocate probably caused by contention within the logic. > !image-2025-01-20-23-38-58-896.png|width=300! > [^cpu_profile_batch.html] > The logic has at least the following 2 potential places to assess: > # allocation cycle in MemtablePool.SubPool#tryAllocate. This logic has a > while loop with a CAS, which can be non-efficient under a high contention, > similar to CASSANDRA-15922 we can try to replace it with addAndGet (need to > check if it does not break the allocator logic) > # swap region logic in NativeAllocator.trySwapRegion (under a high insert > rate 1MiB regions can be swapped quite frequently) > Reproducing test details: > * test logic > {code:java} > ./tools/bin/cassandra-stress "user profile=./profile.yaml no-warmup > ops(insert=1) n=10m" -rate threads=100 -node somenode > {code} > * Cassandra version: 5.0.3 > * configuration changes compared to default: > {code:java} > memtable_allocation_type: offheap_objects > memtable: > configurations: > skiplist: > class_name: SkipListMemtable > trie: > class_name: TrieMemtable > parameters: > shards: 32 > default: > inherits: trie > {code} > * 1 node cluster > * OpenJDK jdk-17.0.12+7 > * Linux kernel: 4.18.0-240.el8.x86_64 > * CPU: 16 cores, Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz > * RAM: 46GiB -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org