Re: [DISCUSS] CEP-19: Trie memtable implementation

Jeremiah D Jordan Tue, 08 Feb 2022 12:29:25 -0800

I don’t really see most users touching the default implementation.  I would 
expect the main reason someone would change would be
1. They run into some bug that is only in one of the implementations.
2. They have persistent memory and so want to use 
https://issues.apache.org/jira/browse/CASSANDRA-13981 
<https://issues.apache.org/jira/browse/CASSANDRA-13981>


Given that I doubt most people will touch it, I think it is good to give 
advanced operators the ability to have more control over switching to things 
that have new performance characteristics.  So I like the idea that the 
proposed configuration approach which allows someone to change to a new 
implementation one node at a time and only for specific tables.

> On Feb 8, 2022, at 2:21 PM, Dinesh Joshi <[email protected]> wrote:
> 
> Thank you for sharing the perf test results.
> 
> Going back to the schema vs yaml configuration. I am concerned users may pick 
> the wrong implementation for their use-case. Is there any chance for us to 
> automatically pick a MemTable implementation based on heuristics? Do we 
> foresee users ever picking the existing SkipList implementation over the Trie 
> Given the performance tests, it seems the Trie implementation is the clear 
> winner.
> 
> To be clear, I am not suggesting we remove the existing implementation. I am 
> for maintaining a pluggable API for various components.
> 
> Dinesh
> 
>> On Feb 7, 2022, at 8:39 AM, Branimir Lambov <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Added some performance results to the ticket: 
>> https://issues.apache.org/jira/browse/CASSANDRA-17240 
>> <https://issues.apache.org/jira/browse/CASSANDRA-17240>
>> 
>> Regards,
>> Branimir
>> 
>> On Sat, Feb 5, 2022 at 10:59 PM Dinesh Joshi <[email protected] 
>> <mailto:[email protected]>> wrote:
>> This is excellent. Thanks for opening up this CEP. It would be great to get 
>> some stats around GC allocation rate / memory pressure, read & write 
>> latencies, etc. compared to existing implementation.
>> 
>> Dinesh
>> 
>>> On Jan 18, 2022, at 2:13 AM, Branimir Lambov <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> The memtable pluggability API (CEP-11) is per-table to enable memtable 
>>> selection that suits specific workflows. It also makes full sense to permit 
>>> per-node configuration, both to be able to modify the configuration to suit 
>>> heterogeneous deployments better, as well as to test changes for 
>>> improvements such as this one.
>>> Recognizing this, the patch comes with a modification to the API 
>>> <https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5>
>>>  that defines memtable templates in cassandra.yaml (i.e. per node) and 
>>> allows the schema to select a template (in addition to being able to 
>>> specify the full memtable configuration). One could use this e.g. by adding:
>>> memtable_templates:
>>>     trie:
>>>         class: TrieMemtable
>>>         shards: 16
>>>     skiplist:
>>>         class: SkipListMemtable
>>> memtable:
>>>     template: skiplist
>>> (which defines two templates and specifies the default memtable 
>>> implementation to use) to cassandra.yaml and specifying  WITH memtable = 
>>> {'template' : 'trie'} in the table schema.
>>> 
>>> I intend to commit this modification with the memtable API 
>>> (CASSANDRA-17034/CEP-11).
>>> 
>>> Performance comparisons will be published soon.
>>> 
>>> Regards,
>>> Branimir
>>> 
>>> On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> Sounds like a great addition
>>> 
>>> Can you share some of the details around gc and latency improvements you’ve 
>>> observed with the list? 
>>> 
>>> Any specific reason the confirmation is through schema vs yaml? Presumably 
>>> it’s so a user can test per table, but this changes every host in a 
>>> cluster, so the impact of a bug/regression is much higher. 
>>> 
>>> 
>>>> On Jan 10, 2022, at 1:30 AM, Branimir Lambov <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> 
>>>> We would like to contribute our TrieMemtable to Cassandra. 
>>>> 
>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
>>>>  
>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation>
>>>> 
>>>> This is a new memtable solution aimed to replace the legacy 
>>>> implementation, developed with the following objectives:
>>>> - lowering the on-heap complexity and the ability to store memtable 
>>>> indexing structures off-heap,
>>>> - leveraging byte order and a trie structure to lower the memory footprint 
>>>> and improve mutation and lookup performance.
>>>> 
>>>> The new memtable relies on CASSANDRA-6936 to translate to and from 
>>>> byte-ordered representations of types, and CASSANDRA-17034 / CEP-11 to 
>>>> plug into Cassandra. The memtable is built on multiple shards of custom 
>>>> in-memory single-writer multiple-reader tries, whose implementation uses a 
>>>> combination of state-of-the-art and novel features for greater efficiency.
>>>> 
>>>> The CEP's JIRA ticket 
>>>> (https://issues.apache.org/jira/browse/CASSANDRA-17240 
>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17240>) contains the 
>>>> initial version of the implementation. In its current form it achieves 
>>>> much better garbage collection latency, significantly bigger data sizes 
>>>> between flushes for the same memory allocation, as well as drastically 
>>>> increased write throughput, and we expect the memory and garbage 
>>>> collection improvements to go much further with upcoming improvements to 
>>>> the solution.
>>>> 
>>>> I am interested in hearing your thoughts on the proposal.
>>>> 
>>>> Regards,
>>>> Branimir
>>>> 
>> 
>

Re: [DISCUSS] CEP-19: Trie memtable implementation

Reply via email to