[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14496592#comment-14496592
 ] 

Sylvain Lebresne commented on CASSANDRA-6477:
---------------------------------------------

Let's recall that the main problem here is how to keep the index consistent 
with the original table. And that's typically a problem if say 2 clients 
simulatenously update the same column to 2 different values: we need to make 
sure that we end up with only whatever of those update wins in the index.

Since for global indexes we know we'll have to do a read before write, what has 
been suggested here is to do that on replicas, at which point we can serialize 
concurrent updates locally to make sure things end up consistent. Now, we could 
do that on every replica but this has a few downsides:
# every replica will update the index and we'll do RF times too many index 
updates.
# once a replica has done his read and computed the update for the data table 
and the index table, we want to put both of those in a batch mutation to avoid 
inconsistencies in case of failures. This makes write more expensive and thus 
the duplication of work all that less desirable.

To avoid that duplication, one possibility is to reuse the same technique we 
use for counters: have the coordinator push the update to one random replica, 
and have that one replica do the read before write and push everything (data 
and index updates) through a batchlog mutation.

The currently linked branch doesn't do all of that yet so it'll have to be 
added before we can commit this.

On top of this, I think that we'll need 2 other things that are not handled yet 
by the branch:
* being able to index table that have collections. Indexing collections, which 
is also not yet supported, can probably be left to a follow-up ticket.
* make sure we hook the index rebuild with streaming so that when the data 
table is repaired we do repair the index too.
Once those have been tackled, I think we can call it good for an initial 
version and let other improvements to follow-ups.


> Global indexes
> --------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to