Re: [DISCUSS] CEP-19: Trie memtable implementation

bened...@apache.org Tue, 08 Feb 2022 12:28:05 -0800

FWIW, I think the proposed approach to configuration is fine.

I think selecting a choice for the user should be done simply and 
deterministically. We should probably default to Trie based memtables for users 
with a fresh config file, and we can consider changing the default in a later 
release for those with an old config file that does not specify an 
implementation.

From: Dinesh Joshi <djo...@apache.org>
Date: Tuesday, 8 February 2022 at 20:21
To: dev@cassandra.apache.org <dev@cassandra.apache.org>
Subject: Re: [DISCUSS] CEP-19: Trie memtable implementation
Thank you for sharing the perf test results.

Going back to the schema vs yaml configuration. I am concerned users may pick 
the wrong implementation for their use-case. Is there any chance for us to 
automatically pick a MemTable implementation based on heuristics? Do we foresee 
users ever picking the existing SkipList implementation over the Trie Given the 
performance tests, it seems the Trie implementation is the clear winner.

To be clear, I am not suggesting we remove the existing implementation. I am 
for maintaining a pluggable API for various components.

Dinesh

On Feb 7, 2022, at 8:39 AM, Branimir Lambov 
<blam...@apache.org<mailto:blam...@apache.org>> wrote:

Added some performance results to the ticket: 
https://issues.apache.org/jira/browse/CASSANDRA-17240

Regards,
Branimir

On Sat, Feb 5, 2022 at 10:59 PM Dinesh Joshi 
<djo...@apache.org<mailto:djo...@apache.org>> wrote:
This is excellent. Thanks for opening up this CEP. It would be great to get 
some stats around GC allocation rate / memory pressure, read & write latencies, 
etc. compared to existing implementation.

Dinesh

On Jan 18, 2022, at 2:13 AM, Branimir Lambov 
<blam...@apache.org<mailto:blam...@apache.org>> wrote:

The memtable pluggability API (CEP-11) is per-table to enable memtable 
selection that suits specific workflows. It also makes full sense to permit 
per-node configuration, both to be able to modify the configuration to suit 
heterogeneous deployments better, as well as to test changes for improvements 
such as this one.
Recognizing this, the patch comes with a modification to the 
API<https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5>
 that defines memtable templates in cassandra.yaml (i.e. per node) and allows 
the schema to select a template (in addition to being able to specify the full 
memtable configuration). One could use this e.g. by adding:

memtable_templates:
    trie:
        class: TrieMemtable
        shards: 16
    skiplist:
        class: SkipListMemtable
memtable:
    template: skiplist
(which defines two templates and specifies the default memtable implementation 
to use) to cassandra.yaml and specifying  WITH memtable = {'template' : 'trie'} 
in the table schema.

I intend to commit this modification with the memtable API 
(CASSANDRA-17034/CEP-11).

Performance comparisons will be published soon.

Regards,
Branimir

On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa 
<jji...@gmail.com<mailto:jji...@gmail.com>> wrote:
Sounds like a great addition

Can you share some of the details around gc and latency improvements you’ve 
observed with the list?

Any specific reason the confirmation is through schema vs yaml? Presumably it’s 
so a user can test per table, but this changes every host in a cluster, so the 
impact of a bug/regression is much higher.

On Jan 10, 2022, at 1:30 AM, Branimir Lambov 
<blam...@apache.org<mailto:blam...@apache.org>> wrote:

We would like to contribute our TrieMemtable to Cassandra.

https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation

This is a new memtable solution aimed to replace the legacy implementation, 
developed with the following objectives:
- lowering the on-heap complexity and the ability to store memtable indexing 
structures off-heap,
- leveraging byte order and a trie structure to lower the memory footprint and 
improve mutation and lookup performance.

The new memtable relies on CASSANDRA-6936 to translate to and from byte-ordered 
representations of types, and CASSANDRA-17034 / CEP-11 to plug into Cassandra. 
The memtable is built on multiple shards of custom in-memory single-writer 
multiple-reader tries, whose implementation uses a combination of 
state-of-the-art and novel features for greater efficiency.

The CEP's JIRA ticket (https://issues.apache.org/jira/browse/CASSANDRA-17240) 
contains the initial version of the implementation. In its current form it 
achieves much better garbage collection latency, significantly bigger data 
sizes between flushes for the same memory allocation, as well as drastically 
increased write throughput, and we expect the memory and garbage collection 
improvements to go much further with upcoming improvements to the solution.

I am interested in hearing your thoughts on the proposal.

Regards,
Branimir

Re: [DISCUSS] CEP-19: Trie memtable implementation

Reply via email to