Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Michael Burman Thu, 22 Jul 2021 05:59:00 -0700

On Wed, 21 Jul 2021 at 17:24, Branimir Lambov <[email protected]>
wrote:


> > Why is flushing control bad to do in CFS and better in the
>   memtable?
>
> I wonder why you would understand this as something that takes away
> control instead of giving it. The CFS is not configurable. With the
> CEP, memtables are configurable at the table level. It is entirely
> possible to implement a memtable wrapper that provides any of the
> examples of functionalities you mention -- and that would be fully
> configurable (just as example, one could very well select a
> time-series-optimized-flush wrapper over skip-list memtable).
>
>
I think this was a bit of miscommunication. I'm not in favor of keeping it
in the CFS, but at least to me (as a reader) CEP indicates the flushing
behavior is suddenly more tied to the Memtable implementation level rather
than being configurable at the table level. Thus that would not reduce
coupling of different flush strategies, but instead just move it from CFS
to Memtable-implementation. And especially with multiple Memtable
implementations that would mean the reusable parts of flushing could end up
being difficult to reuse. If not the intention, then good.


>
> This is another question that the proposal leaves to the memtable
> implementation (or wrapper), but it does make sense to make sure the
> interfaces provide the necessary support for sharding
>

+ 1 to this, that's a good limitation of scope to get forward. I think this
was originally touched in 7282 (where I had it in the memtable impl), but
then got pushed one step outside.

writesShouldSkipCommitLog is a result of scope reduction (call it
> laziness on my part). I could not find a way to tell if commit log
> data may be required for point-in-time-restore or any other feature,
> and the existing method of turning the commit log off does not have
> the right granularity. I am very open to suggestions here.
>

Could this be limited to a single parameter? I'm not sure if the
"isDurable" + "shouldSkip" is interesting instead of "shouldWrite" (etc).
But I also wonder in cases where point-in-time restore is required how one
could achieve it without a commit log (can persistent memory memtable be
rolled back?). That does have an effect on backups. I have to read your
impl how you intended to rewrite the process from Keyspace (where the
requirement for "isDurable" starts from).

Although I do feel like persistent memory exceptions make stuff more
complex.



>
>
>
> > Why is streaming in the memtable? [...] the wanted behavior is just
>   disabling automated flushing
>
> Yes, if zero-copy-streaming is not enabled. And that's exactly what
> this method is there for -- to make sure sstables are not copied
> whole, and that a flush is not done at the end.
>
> Regards,
> Branimir
>
> On Wed, Jul 21, 2021 at 4:33 PM [email protected] <[email protected]>
> wrote:
>
> > I would love to help out with this in any way that I can, FYI. Definitely
> > one of the more impactful performance improvements to the codebase, given
> > the benefits to compaction and memory behaviour.
> >
> > From: [email protected] <[email protected]>
> > Date: Wednesday, 21 July 2021 at 14:32
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > memtable-as-a-commitlog-index
> >
> > Heh, based on 7282? Yeah, I’ve had this idea for a while now (actually
> > there was a paper that did this a long time ago), and it could be very
> nice
> > (if for no other benefit than reducing heap utilisation). I don’t think
> > this requires that they be modelled as the same concept, however, only
> that
> > the Memtable must be able to receive an address into a commit log entry
> and
> > to adopt partial ownership over the entry’s lifecycle.
> >
> >
> > From: Branimir Lambov <[email protected]>
> > Date: Wednesday, 21 July 2021 at 14:28
> > To: [email protected] <[email protected]>
> > Subject: Re: [DISCUSS] CEP-11: Pluggable memtable implementations
> > > In general, I think we need to make up our mind as to whether we
> >   consider the Memtable and CommitLog one logical entity [...], or
> >   whether we want to further untangle those two components from an
> >   architectural perspective which we started down that road on with
> >   the pluggable storage engine work.
> >
> > This CEP is intentionally not attempting to answer this question. FWIW
> > I do not see them as separable (there's evidence to this fact in the
> > codebase), but there are valid secondary uses of the commit log that
> > are served well enough by the current architecture.
> >
> > It is important, however, to let the memtable implementation opt out,
> > to permit it to provide its own solution for data persistence.
> >
> > We should revisit this in the future, especially if Benedict's shared
> > log facility and my plans for a memtable-as-a-commitlog-index
> > evolve.
> >
> > Regards,
> > Branimir
> >
> > On Wed, Jul 21, 2021 at 1:34 PM Michael Burman <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > It is nice to see these going forward (and a great use of CEP) so
> thanks
> > > for the proposal. I have my reservations regarding the linking of
> > memtable
> > > to CommitLog and flushing and should not leak abstraction from one to
> > > another. And I don't see the reasoning why they should be, it doesn't
> > seem
> > > to add anything else than tight coupling of components, reducing reuse
> > and
> > > making things unnecessarily complicated. Also, the streaming notions
> seem
> > > weird to me - how are they related to memtable? Why should memtable
> care
> > > about the behavior outside memtable's responsibility?
> > >
> > > Some misc (with some thoughts split / duplicated to different parts)
> > quotes
> > > and comments:
> > >
> > > > Tight coupling between CFS and memtable will be reduced: flushing
> > > functionality is to be extracted, controlling memtable memory and
> period
> > > expiration will be handled by the memtable.
> > >
> > > Why is flushing control bad to do in CFS and better in the memtable?
> > Doing
> > > it outside memtable would allow to control the flushing regardless of
> how
> > > the actual memtable is implemented. For example, lets say someone would
> > > want to implement the HBase's accordion to Cassandra. It shouldn't
> matter
> > > what the implementation of memtable is as the compaction of different
> > > memtables could be beneficial to all implementations. Or the flushing
> > would
> > > push the memtable to a proper caching instead of only to disk.
> > >
> > > Or if we had per table caching structure, we could control the flushing
> > of
> > > memtables and the cache structure separately. Some data benefits from
> LRU
> > > and some from MRW (most-recently-written) caching strategies. But both
> > > could benefit from the same memtable implementation, it's the data and
> > how
> > > its used that could control how the flushing should work. For example
> > time
> > > series data behaves quite differently in terms of data accesses to
> > > something more "random".
> > >
> > > Or even "total memory control" which would check which tables need more
> > > memory to do their writes and which do not. Or that the memory doesn't
> > grow
> > > over a boundary and needs to manually maintain how much is dedicated to
> > > caching and how much to memtables waiting to be flushed. Or delay
> > flushing
> > > because the disks can't keep up etc. Not to be implemented in this CEP,
> > but
> > > pushing this strategy to memtable would prevent many features.
> > >
> > > > Beyond thread-safety, the concurrency constraints of the memtable are
> > > intentionally left unspecified.
> > >
> > > I like this. I could see use-cases where a single-thread implementation
> > > could actually outperform some concurrent data structures. But it also
> > > provides me with a question, is this proposal going to take an angle
> > > towards per-range memtables? There are certainly benefits to splitting
> > the
> > > memtables as it would reduce the "n" in the operations, thus providing
> > less
> > > overhead in lookups and writes. Although, taking it one step backwards
> I
> > > could see the benefit of having a commitlog per range also, which would
> > > allow higher utilization of NVME drives with larger queue depths. And
> why
> > > not per-range-sstables for faster scale-outs and .. a bit outside the
> > scope
> > > of CEP, but just to ensure that the implementation does not block such
> > > improvement.
> > >
> > > Interfaces:
> > >
> > > > boolean writesAreDurable()
> > > > boolean writesShouldSkipCommitLog()
> > >
> > > The placement inside memtable implementation for these methods just
> feels
> > > incredibly wrong to me. The writing pipeline should have these
> configured
> > > and they could differ for each table even with the same memtable
> > > implementation. Lets take the example of an in-memory memtable use case
> > > that's never written to a SSTable. We could have one table with just
> > simply
> > > in-memory cached storage and another one with a Redis style persistence
> > of
> > > AOF, where writes would be written to the commitlog for fast recovery,
> > but
> > > the data is otherwise always only kept in the memtable instead of
> writing
> > > to the SSTable (for performance reasons). Same implementation of
> memtable
> > > still.
> > >
> > > Why would the write process of the table not ask the table what
> settings
> > it
> > > has and instead asks the memtable what settings the table has? This
> seems
> > > counterintuitive to me. Even the persistent memory case is a bit
> > > questionable, why not simply disable commitlog in the writing process?
> > Why
> > > ask the memtable?
> > >
> > > This feels like memtable is going to be the write pipeline, but to me
> > that
> > > doesn't feel like the correct architectural decision. I'd rather see
> > these
> > > decisions done outside the memtable. Even a persistent memory memtable
> > user
> > > might want to have a commitlog enabled for data capture / shipping
> logs,
> > or
> > > layers of persistence speed. The whole persistent memory without any
> > > commercially known future is a bit weird at the moment (even Optane has
> > no
> > > known manufacturing anymore with last factory being dismantled based on
> > > public information).
> > >
> > > > boolean streamToMemtable()
> > >
> > > And that one I don't understand. Why is streaming in the memtable? This
> > > smells like a scope creep from something else. The explanation would
> > > indicate to me that the wanted behavior is just disabling automated
> > > flushing.
> > >
> > > But these are just some questions that came to my mind while reading
> > this.
> > > And I don't want to sound too negative (most of the features are really
> > > something I'd like to see), perhaps I just misunderstood some of the
> > > motivations why stuff should be brought to memtable instead of being
> > > implemented outside memtable. Perhaps there's something else in the
> write
> > > pipeline arch that needs fixing but is now masqueraded inside this CEP.
> > >
> > > I'm definitely interested to hear more.
> > >
> > >   - Micke
> > >
> > > On Wed, 21 Jul 2021 at 08:24, Berenguer Blasi <
> [email protected]>
> > > wrote:
> > >
> > > > +1. De-tangling, going more modular and clean interfaces sgtm.
> > > >
> > > > On 20/7/21 21:45, Nate McCall wrote:
> > > > > Yay for pluggable memtables!! I havent gone over this in detail
> yet,
> > > but
> > > > > personally I've always thought integrating something like Arrow
> would
> > > be
> > > > > cool for sharing data (that's as far as i've gotten, but anything
> > that
> > > > > makes that kind of experimentation easier would also help with
> > mocking
> > > > test
> > > > > plumbing, so +1 from me).
> > > > >
> > > > > Thanks for putting this together!
> > > > >
> > > > > -Nate
> > > > >
> > > > > On Tue, Jul 20, 2021 at 10:11 PM Branimir Lambov <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> Proposal for a mechanism for plugging in memtable implementations:
> > > > >>
> > > > >>
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-11%3A+Pluggable+memtable+implementations
> > > > >>
> > > > >> The proposal supports using custom memtable implementations to
> > support
> > > > >> development and testing of improved alternatives, but also
> enables a
> > > > >> broader definition of "memtable" to better support more advanced
> use
> > > > cases
> > > > >> like persistent memory. To this end, memtable implementations are
> > > given
> > > > >> control over flushing and storing data in the commit log, enabling
> > > > >> solutions that implement their own durability mechanisms and live
> > much
> > > > >> longer than their classical counterparts. Taken to the extreme,
> this
> > > > also
> > > > >> enables memtables that never flush (in other words, alternative
> > > storage
> > > > >> engines) in a minimally-invasive manner.
> > > > >>
> > > > >> I am curious to hear your thoughts on the proposal.
> > > > >>
> > > > >> Regards,
> > > > >> Branimir
> > > > >>
> > > >
> > > > ---------------------------------------------------------------------
> > > > To unsubscribe, e-mail: [email protected]
> > > > For additional commands, e-mail: [email protected]
> > > >
> > > >
> > >
> >
> >
> > --
> > Branimir Lambov
> > e. [email protected]
> > w. www.datastax.com<http://www.datastax.com>
> >
>
>
> --
> Branimir Lambov
> e. [email protected]
> w. www.datastax.com
>

Re: [DISCUSS] CEP-11: Pluggable memtable implementations

Reply via email to