Re: Is SASI index in Cassandra efficient for high cardinality columns?

DuyHai Doan Fri, 21 Oct 2016 02:51:03 -0700

If you read my blog post about 2nd index deep dive, you'll get all the
answers
Le 21 oct. 2016 10:20, "Kant Kodali" <[email protected]> a écrit :


> Why Secondary index cannot be broken down into token ranges like primary
> index at least for exact matches? That way dont need to scan the whole
> cluster atleast for exact matches. I understand if it is a substring search
> then there will 2^n substrings which equates to 2^n hashes/tokens which can
> be a lot!
>
> On Sat, Oct 15, 2016 at 4:35 AM, DuyHai Doan <[email protected]> wrote:
>
> > If each indexed value has very few matching rows, then querying using
> SASI
> > (or any impl of secondary index) may scan the whole cluster.
> >
> > This is because the index are "distributed" e.g. the indexed values stay
> > on the same nodes as the base data. And even SASI with its own
> > data-structure will not help much here.
> >
> > One should understand that the 2nd index query has to deal with 2 layers:
> >
> > 1) The cluster layer, which is common for any impl of 2nd index. Read my
> > blog post here: http://www.planetcassandra.org/blog/
> > cassandra-native-secondary-index-deep-dive/
> >
> > 2) The local read path, which depends on the impl of 2nd index. Some are
> > using Lucene library like Stratio impl, some rolls in its own data
> > structures like SASI
> >
> > If you have a 1-to-1 relationship between the index value and the
> matching
> > row (or 1-to-a few), I would recommend using materialized views instead:
> >
> > http://www.slideshare.net/doanduyhai/sasi-cassandra-on-
> > the-full-text-search-ride-voxxed-daybelgrade-2016/25
> >
> > Materialized views guarantee that for each search indexed value, you only
> > hit a single node (or N replicas depending on the used consistency level)
> >
> > However, materialized views have their own drawbacks (weeker consistency
> > guarantee) and you can't use range queries (<,  >, ≤, ≥) or full text
> > search on the indexed value
> >
> >
> >
> >
> >
> > On Sat, Oct 15, 2016 at 11:55 AM, Kant Kodali <[email protected]> wrote:
> >
> >> Well I went with the definition from wikipedia and that definition rules
> >> out #1 so it is #2 and it is just one matching row in my case.
> >>
> >>
> >>
> >> On Sat, Oct 15, 2016 at 2:40 AM, DuyHai Doan <[email protected]>
> >> wrote:
> >>
> >> > Define precisely what you mean by "high cardinality columns". Do you
> >> mean:
> >> >
> >> > 1) a single indexed value is present in a lot of rows
> >> > 2) a single indexed value has only a few (if not just one) matching
> row
> >> >
> >> >
> >> > On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodali <[email protected]>
> wrote:
> >> >
> >> >> I understand Secondary Indexes in general are inefficient on high
> >> >> cardinality columns but since SASI is built from scratch I wonder if
> >> the
> >> >> same argument applies there? If not, Why? Because I believe primary
> >> keys in
> >> >> Cassandra are indeed indexed and since Primary key is supposed to be
> >> the
> >> >> column with highest cardinality why not do the same for secondary
> >> indexes?
> >> >>
> >> >
> >> >
> >>
> >
> >
>

Re: Is SASI index in Cassandra efficient for high cardinality columns?

Reply via email to