[
https://issues.apache.org/jira/browse/CASSANDRA-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17488240#comment-17488240
]
Branimir Lambov commented on CASSANDRA-17240:
---------------------------------------------
Attached some performance data comparing the new trie memtable with the legacy
skip list one. The test we ran is a density test which runs a 90:10 write:read
workload with 100-byte payloads to over 1TB of data on an {{i3.4xlarge}}
instance with the following settings to remove some of the biggest throughput
bottlenecks:
{code:java}
memtable_allocation_type: offheap_objects
memtable_flush_writers: 8
memtable_heap_space_in_mb: 16384
memtable_offheap_space_in_mb: 16384
concurrent_reads: 256
concurrent_writes: 256
commitlog_total_space_in_mb: 51200
commitlog_segment_size_in_mb: 320
commitlog_compression:
class_name: LZ4Compressor
disk_access_mode: mmap_index_only
file_cache_size_in_mb: 8192
compaction_throughput_mb_per_sec: 0
concurrent_compactors: 30
{code}
The test is meant to measure sustained throughput, and with the current C* code
is quickly limited by the performance of compaction (compaction cannot keep up,
sstables accumulate, and reads start dominating the time). The throughput stage
graph looks like this:
!throughput_apache.png!
{{TrieMemtable}} (in red) starts off with double the performance of the legacy
{{SkipListMemtable}} (in orange), and maintains a significant lead throughout
the test. We have previously seen a significant improvement in throughput when
memtables are sharded, thus we also tested two sharded variations of the skip
list solution, with and without locking. Both versions lead over the unsharded
skip-list, but are far from the performance of the new solution. (Note: the
locking version (in green), which gives compaction threads more chances to run,
meets its performance towards the end of the test when it is completely
dominated by the effects of compaction.)
With improved and tuned compaction (using further improvements we intend to
port C*), the trie memtable maintains ~2.3x better throughput:
!throughput_SG.png!
One interesting aspect of the comparison is the heap behavior, especially old
generation sizes, during the throughput stage.
{{{}SkipListMemtable{}}}:
!SkipListMemtable-OSS.png!
vs. {{{}TrieMemtable{}}}:
!TrieMemtable-OSS.png!
The total garbage collection time through all stages of the test is more than
halved.
Additionally, the new memtable is able to accept more data for the same memory
allocation, which results in 30% bigger L0 sstables, reducing the number of
sstables and the need for compaction and further improving performance.
> CEP-19: Trie memtable implementation
> ------------------------------------
>
> Key: CASSANDRA-17240
> URL: https://issues.apache.org/jira/browse/CASSANDRA-17240
> Project: Cassandra
> Issue Type: Improvement
> Components: Local/Memtable
> Reporter: Branimir Lambov
> Priority: Normal
> Attachments: SkipListMemtable-OSS.png, TrieMemtable-OSS.png,
> density_SG.html.gz, density_test_with_sharding.html.gz, throughput_SG.png,
> throughput_apache.png
>
>
> Trie-based memtable implementation as described in CEP-19, built on top of
> CASSANDRA-17034 and CASSANDRA-6936.
> The implementation is available in this
> [branch|https://github.com/blambov/cassandra/tree/CASSANDRA-17240].
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]