[
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522136#comment-14522136
]
Matthias Broecheler commented on CASSANDRA-6477:
------------------------------------------------
I think the discussion around materialized views (which I would love to see in
C* at some point) is distracting from what this ticket is really about: closing
a hole in the indexing story for C*.
In RDBMS (and pretty much all other database systems), indexes are used to
efficiently retrieve a set of rows identified by their columns values in a
particular order at the expense of write performance. By design, C* builds a
100% selectivity index on the primary key. In addition, one can install
secondary indexes. Those secondary indexes are useful up to a certain
selectivity %. Beyond that threshold, it becomes increasingly more efficient to
maintain the index as a global distributed hash map rather than a local index
on each node. And that's the hole in the indexing story, because those types of
indexes must currently be maintained by the application.
I am stating the obvious here to point out that the first problem is to provide
the infrastructure to create that second class of indexes while ensuring some
form of (eventual) consistency. Much like with 2i, once that is in place one
can utilize the infrastructure to build other things on top - including
materialized views which will need this to begin with (if the primary key of
your materialized view has high selectivity).
As for nomenclature, I agree that "global vs local" index is a technical
distinction that has little to no meaning to the user. After all, they want to
build an index to get to their data quickly. How that happens is highly
secondary. Initially, it might make sense to ask the user to specify the
selectivity estimate for the index (defaulting to low) and for C* to pick the
best indexing approach based on that. In the future, one could utilize sampled
histograms to help the user with that decision.
> Global indexes
> --------------
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Assignee: Carl Yeksigian
> Labels: cql
> Fix For: 3.x
>
>
> Local indexes are suitable for low-cardinality data, where spreading the
> index across the cluster is a Good Thing. However, for high-cardinality
> data, local indexes require querying most nodes in the cluster even if only a
> handful of rows is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)