Re: DateTieredCompactionStrategy and static columns

Jonathan Haddad Fri, 01 May 2015 06:45:18 -0700

I think what Benedict has described feels very much like a very specialized
version of the following:


1. Updates to different tables in a batch become atomic if the node is a
replica for the partition
2. Supporting Inner joins if the partition key is the same in both tables.

I'd rather see join support personally :)

Jon

On Fri, May 1, 2015 at 6:38 AM graham sanderson <gra...@vast.com> wrote:

> I 100% agree with Benedict, but just to be clear about my use case
>
> 1) We have state of lets say real estate listings
> 2) We get field level deltas for them
> 3) Previously we would store the base state all the deltas in partition
> and roll them up from the beginning of time (this was a prototype and silly
> since there was no expiration strategy)
> 4) Preferred plan is to keep current state in a static map (i.e. one delta
> field only updates one cell) - we are MVCC but in the common case the
> latest version will be what we want
> 5) However we require history, so we’d use the partition to keep TTL
> deltas going backwards from the now state - this seems like a common
> pattern people would want. Note also that sometimes we might need to apply
> reverse deltas if C* is ahead of our SOLR indexes
>
> The static columns and the regular columns ARE completely different in
> behavior/lifecycle, so I’d definitely vote for them being treated as such.
>
>
> > On May 1, 2015, at 7:27 AM, Benedict Elliott Smith <
> belliottsm...@datastax.com> wrote:
> >
> >>
> >> How would it be different from creating an actual real extra table
> instead?
> >
> >
> > There's nothing that warrants making the codebase more complex to
> >> accomplish something it already does.
> >
> >
> > As far as I was aware, the only point of static columns was to support
> the
> > thrift ability to mutate and read them in the same expression, with
> > atomicity and isolation. As to whether or not it is more complex, I'm not
> > at all convinced that it would be. We have had a lot of unexpected
> special
> > casing added to ensure they behave correctly (e.g. paging is broken), and
> > have complicated the comparison/slice logic to accommodate them, so that
> it
> > is harder to reason about (and to optimise). They also have very
> different
> > compaction characteristics, so the complexity on the user is increased
> > without their necessarily realising it. All told, it introduces a lot
> more
> > subtlety of behaviour than there would be with a separate set of
> sstables,
> > or perhaps a separate file attached to each sstable.
> >
> > Of course, we've already implemented it as a specialisation of the
> > slice/comparator, I think because it seemed like the least frictional
> path
> > to do so, but that doesn't mean it is the least complex. It does mean
> it's
> > the least work (assuming we're now on top of the bugs), which is its own
> > virtue.
> >
> > There are some advantages to having them managed separately, and
> advantages
> > to having them combined. Combined, for small partitions, they can be read
> > in the same seek. However for large partitions this is no longer true,
> and
> > we may behave much worse by polluting the page cache with lots of
> unwanted
> > data that is adjacent to the static columns. If they were managed
> > separately, the page cache would be populated mostly with other static
> > columns, which may be more likely of use. We could quite easily have a
> > "static column" cache, also, and completely avoid merging them. Or at
> least
> > we could easily read them with collectTimeOrderedData instead of
> > collectAllData semantics.
> >
> > All told, it certainly isn't a terrible idea, and shouldn't be dismissed
> so
> > readily. Personally I think in the long run whether or not we manage
> static
> > columns together with non-static columns is dependent on if we intend to
> > add tiered "static" columns (i.e., if each level of clustering component
> > can have columns associated with it). If we do, we should definitely keep
> > it all inline. If not, it probably permits a lot better behaviour to
> > separate them, since it's easier to reason about and improve their
> distinct
> > characteristics.
> >
> >
> > On Fri, May 1, 2015 at 1:24 AM, graham sanderson <gra...@vast.com>
> wrote:
> >
> >> Well you lose the atomicity and isolation, but in this case that is
> >> probably fine
> >>
> >> That said, in every interaction I’ve had with static columns, they seem
> to
> >> be an odd duck (e.g. adding or complicating range slices), perhaps
> worthy
> >> of their own code path and sstables. Just food for thought.
> >>
> >>> On Apr 30, 2015, at 7:13 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
> >>>
> >>> If you want it in a separate sstable, just use a separate table.
> There's
> >>> nothing that warrants making the codebase more complex to accomplish
> >>> something it already does.
> >>>
> >>> On Thu, Apr 30, 2015 at 5:07 PM graham sanderson <gra...@vast.com>
> >> wrote:
> >>>
> >>>> Anyone here have an opinion; how realistic would it be to have a
> >> separate
> >>>> memtable/sstable for static columns?
> >>>>
> >>>> Begin forwarded message:
> >>>>
> >>>> *From: *Jonathan Haddad <j...@jonhaddad.com>
> >>>> *Subject: **Re: DateTieredCompactionStrategy and static columns*
> >>>> *Date: *April 30, 2015 at 3:55:46 PM CDT
> >>>> *To: *u...@cassandra.apache.org
> >>>> *Reply-To: *u...@cassandra.apache.org
> >>>>
> >>>>
> >>>> I suspect this will kill the benefit of DTCS, but haven't tested it to
> >> be
> >>>> 100% here.
> >>>>
> >>>> The benefit of DTCS is that sstables are selected for compaction based
> >> on
> >>>> the age of the data, not their size.  When you mix TTL'ed data and non
> >>>> TTL'ed data, you end up screwing with the "drop the entire SSTable"
> >>>> optimization.  I don't believe this is any different just because
> you're
> >>>> mixing in static columns.  What I think will happen is you'll end up
> >> with
> >>>> an sstable that's almost entirely TTL'ed with a few static columns
> that
> >>>> will never get compacted or dropped.  Pretty much the worst scenario I
> >> can
> >>>> think of.
> >>>>
> >>>>
> >>>>
> >>>> On Thu, Apr 30, 2015 at 11:21 AM graham sanderson <gra...@vast.com>
> >> wrote:
> >>>>
> >>>>> I have a potential use case I haven’t had a chance to prototype yet,
> >>>>> which would normally be a good candidate for DTCS (i.e. data
> delivered
> >> in
> >>>>> order and a fixed TTL), however with every write we’d also be
> updating
> >> some
> >>>>> static cells (namely a few key/values in a static map<text.text> CQL
> >>>>> column). There could also be explicit deletes of keys in the static
> >> map,
> >>>>> though that’s not 100% necessary.
> >>>>>
> >>>>> Since those columns don’t have TTL, without reading thru the code
> code
> >>>>> and/or trying it, I have no idea what effect this has on DTCS
> (perhaps
> >> it
> >>>>> needs to use separate sstables for static columns). Has anyone tried
> >> this.
> >>>>> If not I eventually will and will report back.
> >>>>
> >>>>
> >>
> >>
>
>

Re: DateTieredCompactionStrategy and static columns

Reply via email to