Re: DateTieredCompactionStrategy and static columns

Benedict Elliott Smith Fri, 01 May 2015 05:42:40 -0700

>
> How would it be different from creating an actual real extra table instead?

There's nothing that warrants making the codebase more complex to
> accomplish something it already does.

As far as I was aware, the only point of static columns was to support the
thrift ability to mutate and read them in the same expression, with
atomicity and isolation. As to whether or not it is more complex, I'm not
at all convinced that it would be. We have had a lot of unexpected special
casing added to ensure they behave correctly (e.g. paging is broken), and
have complicated the comparison/slice logic to accommodate them, so that it
is harder to reason about (and to optimise). They also have very different
compaction characteristics, so the complexity on the user is increased
without their necessarily realising it. All told, it introduces a lot more
subtlety of behaviour than there would be with a separate set of sstables,
or perhaps a separate file attached to each sstable.

Of course, we've already implemented it as a specialisation of the
slice/comparator, I think because it seemed like the least frictional path
to do so, but that doesn't mean it is the least complex. It does mean it's
the least work (assuming we're now on top of the bugs), which is its own
virtue.

There are some advantages to having them managed separately, and advantages
to having them combined. Combined, for small partitions, they can be read
in the same seek. However for large partitions this is no longer true, and
we may behave much worse by polluting the page cache with lots of unwanted
data that is adjacent to the static columns. If they were managed
separately, the page cache would be populated mostly with other static
columns, which may be more likely of use. We could quite easily have a
"static column" cache, also, and completely avoid merging them. Or at least
we could easily read them with collectTimeOrderedData instead of
collectAllData semantics.

All told, it certainly isn't a terrible idea, and shouldn't be dismissed so
readily. Personally I think in the long run whether or not we manage static
columns together with non-static columns is dependent on if we intend to
add tiered "static" columns (i.e., if each level of clustering component
can have columns associated with it). If we do, we should definitely keep
it all inline. If not, it probably permits a lot better behaviour to
separate them, since it's easier to reason about and improve their distinct
characteristics.

On Fri, May 1, 2015 at 1:24 AM, graham sanderson <gra...@vast.com> wrote:

> Well you lose the atomicity and isolation, but in this case that is
> probably fine
>
> That said, in every interaction I’ve had with static columns, they seem to
> be an odd duck (e.g. adding or complicating range slices), perhaps worthy
> of their own code path and sstables. Just food for thought.
>
> > On Apr 30, 2015, at 7:13 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:
> >
> > If you want it in a separate sstable, just use a separate table.  There's
> > nothing that warrants making the codebase more complex to accomplish
> > something it already does.
> >
> > On Thu, Apr 30, 2015 at 5:07 PM graham sanderson <gra...@vast.com>
> wrote:
> >
> >> Anyone here have an opinion; how realistic would it be to have a
> separate
> >> memtable/sstable for static columns?
> >>
> >> Begin forwarded message:
> >>
> >> *From: *Jonathan Haddad <j...@jonhaddad.com>
> >> *Subject: **Re: DateTieredCompactionStrategy and static columns*
> >> *Date: *April 30, 2015 at 3:55:46 PM CDT
> >> *To: *u...@cassandra.apache.org
> >> *Reply-To: *u...@cassandra.apache.org
> >>
> >>
> >> I suspect this will kill the benefit of DTCS, but haven't tested it to
> be
> >> 100% here.
> >>
> >> The benefit of DTCS is that sstables are selected for compaction based
> on
> >> the age of the data, not their size.  When you mix TTL'ed data and non
> >> TTL'ed data, you end up screwing with the "drop the entire SSTable"
> >> optimization.  I don't believe this is any different just because you're
> >> mixing in static columns.  What I think will happen is you'll end up
> with
> >> an sstable that's almost entirely TTL'ed with a few static columns that
> >> will never get compacted or dropped.  Pretty much the worst scenario I
> can
> >> think of.
> >>
> >>
> >>
> >> On Thu, Apr 30, 2015 at 11:21 AM graham sanderson <gra...@vast.com>
> wrote:
> >>
> >>> I have a potential use case I haven’t had a chance to prototype yet,
> >>> which would normally be a good candidate for DTCS (i.e. data delivered
> in
> >>> order and a fixed TTL), however with every write we’d also be updating
> some
> >>> static cells (namely a few key/values in a static map<text.text> CQL
> >>> column). There could also be explicit deletes of keys in the static
> map,
> >>> though that’s not 100% necessary.
> >>>
> >>> Since those columns don’t have TTL, without reading thru the code code
> >>> and/or trying it, I have no idea what effect this has on DTCS (perhaps
> it
> >>> needs to use separate sstables for static columns). Has anyone tried
> this.
> >>> If not I eventually will and will report back.
> >>
> >>
>
>

Re: DateTieredCompactionStrategy and static columns

Reply via email to