Re: [DISCUSS] CEP-19: Trie memtable implementation

Dinesh Joshi Tue, 08 Feb 2022 12:40:21 -0800

My quick reading of the code suggests that schema will override the operator's 
default preference in the YAML. In the event of a bug in the new 
implementation, there could be situation where the operator might need to 
override this via the YAML.


> On Feb 8, 2022, at 12:29 PM, Jeremiah D Jordan <[email protected]> 
> wrote:
> 
> I don’t really see most users touching the default implementation.  I would 
> expect the main reason someone would change would be
> 1. They run into some bug that is only in one of the implementations.
> 2. They have persistent memory and so want to use 
> https://issues.apache.org/jira/browse/CASSANDRA-13981 
> <https://issues.apache.org/jira/browse/CASSANDRA-13981>
> 
> Given that I doubt most people will touch it, I think it is good to give 
> advanced operators the ability to have more control over switching to things 
> that have new performance characteristics.  So I like the idea that the 
> proposed configuration approach which allows someone to change to a new 
> implementation one node at a time and only for specific tables.
> 
>> On Feb 8, 2022, at 2:21 PM, Dinesh Joshi <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> Thank you for sharing the perf test results.
>> 
>> Going back to the schema vs yaml configuration. I am concerned users may 
>> pick the wrong implementation for their use-case. Is there any chance for us 
>> to automatically pick a MemTable implementation based on heuristics? Do we 
>> foresee users ever picking the existing SkipList implementation over the 
>> Trie Given the performance tests, it seems the Trie implementation is the 
>> clear winner.
>> 
>> To be clear, I am not suggesting we remove the existing implementation. I am 
>> for maintaining a pluggable API for various components.
>> 
>> Dinesh
>> 
>>> On Feb 7, 2022, at 8:39 AM, Branimir Lambov <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Added some performance results to the ticket: 
>>> https://issues.apache.org/jira/browse/CASSANDRA-17240 
>>> <https://issues.apache.org/jira/browse/CASSANDRA-17240>
>>> 
>>> Regards,
>>> Branimir
>>> 
>>> On Sat, Feb 5, 2022 at 10:59 PM Dinesh Joshi <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> This is excellent. Thanks for opening up this CEP. It would be great to get 
>>> some stats around GC allocation rate / memory pressure, read & write 
>>> latencies, etc. compared to existing implementation.
>>> 
>>> Dinesh
>>> 
>>>> On Jan 18, 2022, at 2:13 AM, Branimir Lambov <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> The memtable pluggability API (CEP-11) is per-table to enable memtable 
>>>> selection that suits specific workflows. It also makes full sense to 
>>>> permit per-node configuration, both to be able to modify the configuration 
>>>> to suit heterogeneous deployments better, as well as to test changes for 
>>>> improvements such as this one.
>>>> Recognizing this, the patch comes with a modification to the API 
>>>> <https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5>
>>>>  that defines memtable templates in cassandra.yaml (i.e. per node) and 
>>>> allows the schema to select a template (in addition to being able to 
>>>> specify the full memtable configuration). One could use this e.g. by 
>>>> adding:
>>>> memtable_templates:
>>>>     trie:
>>>>         class: TrieMemtable
>>>>         shards: 16
>>>>     skiplist:
>>>>         class: SkipListMemtable
>>>> memtable:
>>>>     template: skiplist
>>>> (which defines two templates and specifies the default memtable 
>>>> implementation to use) to cassandra.yaml and specifying  WITH memtable = 
>>>> {'template' : 'trie'} in the table schema.
>>>> 
>>>> I intend to commit this modification with the memtable API 
>>>> (CASSANDRA-17034/CEP-11).
>>>> 
>>>> Performance comparisons will be published soon.
>>>> 
>>>> Regards,
>>>> Branimir
>>>> 
>>>> On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> Sounds like a great addition
>>>> 
>>>> Can you share some of the details around gc and latency improvements 
>>>> you’ve observed with the list? 
>>>> 
>>>> Any specific reason the confirmation is through schema vs yaml? Presumably 
>>>> it’s so a user can test per table, but this changes every host in a 
>>>> cluster, so the impact of a bug/regression is much higher. 
>>>> 
>>>> 
>>>>> On Jan 10, 2022, at 1:30 AM, Branimir Lambov <[email protected] 
>>>>> <mailto:[email protected]>> wrote:
>>>>> 
>>>>> 
>>>>> We would like to contribute our TrieMemtable to Cassandra. 
>>>>> 
>>>>> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation
>>>>>  
>>>>> <https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation>
>>>>> 
>>>>> This is a new memtable solution aimed to replace the legacy 
>>>>> implementation, developed with the following objectives:
>>>>> - lowering the on-heap complexity and the ability to store memtable 
>>>>> indexing structures off-heap,
>>>>> - leveraging byte order and a trie structure to lower the memory 
>>>>> footprint and improve mutation and lookup performance.
>>>>> 
>>>>> The new memtable relies on CASSANDRA-6936 to translate to and from 
>>>>> byte-ordered representations of types, and CASSANDRA-17034 / CEP-11 to 
>>>>> plug into Cassandra. The memtable is built on multiple shards of custom 
>>>>> in-memory single-writer multiple-reader tries, whose implementation uses 
>>>>> a combination of state-of-the-art and novel features for greater 
>>>>> efficiency.
>>>>> 
>>>>> The CEP's JIRA ticket 
>>>>> (https://issues.apache.org/jira/browse/CASSANDRA-17240 
>>>>> <https://issues.apache.org/jira/browse/CASSANDRA-17240>) contains the 
>>>>> initial version of the implementation. In its current form it achieves 
>>>>> much better garbage collection latency, significantly bigger data sizes 
>>>>> between flushes for the same memory allocation, as well as drastically 
>>>>> increased write throughput, and we expect the memory and garbage 
>>>>> collection improvements to go much further with upcoming improvements to 
>>>>> the solution.
>>>>> 
>>>>> I am interested in hearing your thoughts on the proposal.
>>>>> 
>>>>> Regards,
>>>>> Branimir
>>>>> 
>>> 
>> 
>

Re: [DISCUSS] CEP-19: Trie memtable implementation

Reply via email to