If you read my blog post about 2nd index deep dive, you'll get all the answers Le 21 oct. 2016 10:20, "Kant Kodali" <k...@peernova.com> a écrit :
> Why Secondary index cannot be broken down into token ranges like primary > index at least for exact matches? That way dont need to scan the whole > cluster atleast for exact matches. I understand if it is a substring search > then there will 2^n substrings which equates to 2^n hashes/tokens which can > be a lot! > > On Sat, Oct 15, 2016 at 4:35 AM, DuyHai Doan <doanduy...@gmail.com> wrote: > > > If each indexed value has very few matching rows, then querying using > SASI > > (or any impl of secondary index) may scan the whole cluster. > > > > This is because the index are "distributed" e.g. the indexed values stay > > on the same nodes as the base data. And even SASI with its own > > data-structure will not help much here. > > > > One should understand that the 2nd index query has to deal with 2 layers: > > > > 1) The cluster layer, which is common for any impl of 2nd index. Read my > > blog post here: http://www.planetcassandra.org/blog/ > > cassandra-native-secondary-index-deep-dive/ > > > > 2) The local read path, which depends on the impl of 2nd index. Some are > > using Lucene library like Stratio impl, some rolls in its own data > > structures like SASI > > > > If you have a 1-to-1 relationship between the index value and the > matching > > row (or 1-to-a few), I would recommend using materialized views instead: > > > > http://www.slideshare.net/doanduyhai/sasi-cassandra-on- > > the-full-text-search-ride-voxxed-daybelgrade-2016/25 > > > > Materialized views guarantee that for each search indexed value, you only > > hit a single node (or N replicas depending on the used consistency level) > > > > However, materialized views have their own drawbacks (weeker consistency > > guarantee) and you can't use range queries (<, >, ≤, ≥) or full text > > search on the indexed value > > > > > > > > > > > > On Sat, Oct 15, 2016 at 11:55 AM, Kant Kodali <k...@peernova.com> wrote: > > > >> Well I went with the definition from wikipedia and that definition rules > >> out #1 so it is #2 and it is just one matching row in my case. > >> > >> > >> > >> On Sat, Oct 15, 2016 at 2:40 AM, DuyHai Doan <doanduy...@gmail.com> > >> wrote: > >> > >> > Define precisely what you mean by "high cardinality columns". Do you > >> mean: > >> > > >> > 1) a single indexed value is present in a lot of rows > >> > 2) a single indexed value has only a few (if not just one) matching > row > >> > > >> > > >> > On Sat, Oct 15, 2016 at 8:37 AM, Kant Kodali <k...@peernova.com> > wrote: > >> > > >> >> I understand Secondary Indexes in general are inefficient on high > >> >> cardinality columns but since SASI is built from scratch I wonder if > >> the > >> >> same argument applies there? If not, Why? Because I believe primary > >> keys in > >> >> Cassandra are indeed indexed and since Primary key is supposed to be > >> the > >> >> column with highest cardinality why not do the same for secondary > >> indexes? > >> >> > >> > > >> > > >> > > > > >