Re: [DISCUSS] CEP-19: Trie memtable implementation

Bowen Song Wed, 09 Feb 2022 02:13:59 -0800

TBH, I don't have an opinion on the configuration. I just want to saythat if at the end we decide the configuration in the YAML shouldoverride the table schema, I would like to recommend that we specifyinga list of whitelisted (or blacklisted) "templates" in the YAML file, andthe template chosen by the table schema is used if it's enabled,otherwise fallback to a default template, which could be the firstelement in the whitelist if that's used, or a separate configurationentry if a blacklist is used. The list should be optional in the YAML,and an empty list or the absent of it means everything is enabled.


Advantage of this:

1. it doesn't require the operator to configure this, as an empty orabsent list by default enables all templates and should work fine inmost cases.

2. it allows the operator to whitelist / blacklist any template if everneeded (e.g. due to a bug), and also allow them to choose a fallback option.

3. the table schema has priority as long as the chosen template is notexplicitly disabled by the YAML.

4. it allows the operator to selectively disable some templates withoutforcing all tables to use the same template specified by the YAML.



On 09/02/2022 09:43, bened...@apache.org wrote:

Why not have some default templates that can be specified by theschema without touching the yaml, but overridden in the yaml as necessary?


*From: *Branimir Lambov <blam...@apache.org>
*Date: *Wednesday, 9 February 2022 at 09:35
*To: *dev@cassandra.apache.org <dev@cassandra.apache.org>
*Subject: *Re: [DISCUSS] CEP-19: Trie memtable implementation

If I understand this correctly, you prefer _not_ to have an option togive the configuration explicitly in the schema. I.e. force theconfigurations ("templates" in current terms) to be specified in theyaml, and only allow tables to specify which one to use among them?


This does sound at least as good to me, and I'll happily change the API.

Regards,

Branimir

On Tue, Feb 8, 2022 at 10:40 PM Dinesh Joshi <djo...@apache.org> wrote:

    My quick reading of the code suggests that schema will override
    the operator's default preference in the YAML. In the event of a
    bug in the new implementation, there could be situation where the
    operator might need to override this via the YAML.



        On Feb 8, 2022, at 12:29 PM, Jeremiah D Jordan
        <jeremiah.jor...@gmail.com> wrote:

        I don’t really see most users touching the default
        implementation.  I would expect the main reason someone would
        change would be

        1. They run into some bug that is only in one of the
        implementations.

        2. They have persistent memory and so want to use
        https://issues.apache.org/jira/browse/CASSANDRA-13981

        Given that I doubt most people will touch it, I think it is
        good to give advanced operators the ability to have more
        control over switching to things that have new performance
        characteristics.  So I like the idea that the proposed
        configuration approach which allows someone to change to a new
        implementation one node at a time and only for specific tables.



            On Feb 8, 2022, at 2:21 PM, Dinesh Joshi
            <djo...@apache.org> wrote:

            Thank you for sharing the perf test results.

            Going back to the schema vs yaml configuration. I am
            concerned users may pick the wrong implementation for
            their use-case. Is there any chance for us to
            automatically pick a MemTable implementation based on
            heuristics? Do we foresee users ever picking the existing
            SkipList implementation over the Trie Given the
            performance tests, it seems the Trie implementation is the
            clear winner.

            To be clear, I am not suggesting we remove the existing
            implementation. I am for maintaining a pluggable API for
            various components.

            Dinesh



                On Feb 7, 2022, at 8:39 AM, Branimir Lambov
                <blam...@apache.org> wrote:

                Added some performance results to the ticket:
                https://issues.apache.org/jira/browse/CASSANDRA-17240

                Regards,

                Branimir

                On Sat, Feb 5, 2022 at 10:59 PM Dinesh Joshi
                <djo...@apache.org> wrote:

                    This is excellent. Thanks for opening up this CEP.
                    It would be great to get some stats around GC
                    allocation rate / memory pressure, read & write
                    latencies, etc. compared to existing implementation.

                    Dinesh



                        On Jan 18, 2022, at 2:13 AM, Branimir Lambov
                        <blam...@apache.org> wrote:

                        The memtable pluggability API (CEP-11) is
                        per-table to enable memtable selection
                        that suits specific workflows. It also makes
                        full sense to permit per-node configuration,
                        both to be able to modify the configuration to
                        suit heterogeneous deployments better, as well
                        as to test changes for improvements such as
                        this one.

                        Recognizing this, the patch comes with a
                        modification to the API
                        
<https://github.com/blambov/cassandra/commit/24b558ba2f71a2f040804e28993cc914b31298f5>
                        that defines memtable templates in
                        cassandra.yaml (i.e. per node) and allows the
                        schema to select a template (in addition to
                        being able to specify the full memtable
                        configuration). One could use this e.g. by adding:

                        *memtable_templates*:
                             *trie*:
                                 *class*: TrieMemtable
                                 *shards*: 16
                             *skiplist*:
                                 *class*: SkipListMemtable
                        *memtable*:
                             *template*: skiplist

                        (which defines two templates and specifies the
                        default memtable implementation to use) to
                        cassandra.yaml and specifying *WITH memtable =
                        {'template' : 'trie'} *in the table schema.

                        I intend to commit this modification with the
                        memtable API (CASSANDRA-17034/CEP-11).

                        Performance comparisons will be published soon.

                        Regards,

                        Branimir

                        On Fri, Jan 14, 2022 at 4:15 PM Jeff Jirsa
                        <jji...@gmail.com> wrote:

                            Sounds like a great addition

                            Can you share some of the details around
                            gc and latency improvements you’ve
                            observed with the list?

                            Any specific reason the confirmation is
                            through schema vs yaml? Presumably it’s so
                            a user can test per table, but this
                            changes every host in a cluster, so the
                            impact of a bug/regression is much higher.



                                On Jan 10, 2022, at 1:30 AM, Branimir
                                Lambov <blam...@apache.org> wrote:

                                

                                We would like to contribute our
                                TrieMemtable to Cassandra.

                                
https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-19%3A+Trie+memtable+implementation

                                This is a new memtable solution aimed
                                to replace the legacy implementation,
                                developed with the following objectives:

                                - lowering the on-heap complexity and
                                the ability to store memtable indexing
                                structures off-heap,

                                - leveraging byte order and a trie
                                structure to lower the memory
                                footprint and improve mutation and
                                lookup performance.

                                The new memtable relies on
                                CASSANDRA-6936 to translate to and
                                from byte-ordered representations of
                                types, and CASSANDRA-17034 / CEP-11 to
                                plug into Cassandra. The memtable is
                                built on multiple shards of custom
                                in-memory single-writer
                                multiple-reader tries, whose
                                implementation uses a combination of
                                state-of-the-art and novel features
                                for greater efficiency.

                                The CEP's JIRA ticket
                                
(https://issues.apache.org/jira/browse/CASSANDRA-17240)
                                contains the initial version of the
                                implementation. In its current form it
                                achieves much better garbage
                                collection latency, significantly
                                bigger data sizes between flushes for
                                the same memory allocation, as well as
                                drastically increased write
                                throughput, and we expect the memory
                                and garbage collection improvements to
                                go much further with upcoming
                                improvements to the solution.

                                I am interested in hearing your
                                thoughts on the proposal.

                                Regards,

                                Branimir

Re: [DISCUSS] CEP-19: Trie memtable implementation

Reply via email to