[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Sylvain Lebresne (JIRA) Fri, 17 Jul 2015 00:41:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630910#comment-14630910
 ]


Sylvain Lebresne commented on CASSANDRA-6477:
---------------------------------------------

bq. Why do we need this at all? Since replicas are in charge of updating MV 
then normal hints should perform the same function as batchlog except without 
the performance hint in the normal case.

Allow me to sum up how we deal with consistency guarantees, why we do it this 
way and why I don't think hints work. I'm sorry if that response is a bit 
verbose but as this is the most important thing of this ticket imo, I think it 
bears repeating and making sure we're all on the same page.

The main guarantee we have to provide here is that MV are eventually consistent 
with their base table. In other words, whatever failure scenarios we run into, 
we should never have an inconsistency that never gets resolved. The canonical 
example of why this is not a given is we have a column {{c = 2}} in the base 
table that is also in a MV PK, and we have 2 concurrent updates A (sets {{c = 
3}}) and B (sets {{c = 4}}). Without any kind of protection, we could end up 
with the MV permanently having 2 entries, one of A and one for B, which is 
incorrect (which should eventually converge to the update that has the biggest 
timestamp since that's what the base table will keep). To the best of my 
knowledge, there is 2 fundamental components to avoiding such permanent 
inconsistency in the currently written patch/approach:
# On each replica, we synchronize/serialize the read-before-write done on the 
base table. This guarantees that we won't have A and B racing on a single 
base-table replica. Or, in other words, *if* the same replica sees both update 
(where "sees" means "do the read-before-write-and-update-MV-accordingly" 
dance), then it will properly update the MV. And since each base-table replica 
updates all MV-table replica, it's enough that a single base-table replica sees 
both update to guarantee eventually consistent of the MV. But we do need to 
guarantee _at least_ one such base-table replica sees both updates and that's 
the 2nd component.
# To provided that latter guarantee, we first put each base-table update that 
include MV updates in the batchlog on the coordinator, and we only remove it 
from the batchlog once a _QUORUM_ of replica have aknowledged the write (this 
is importantly not dependent of the CL, eventual consistency must be guaranteed 
whatever CL you use). That guarantees us that until a QUORUM of replica have 
seen the update, we'll keep replaying it, which in turns guarantees us that for 
any 2 updates, at least one replica will have "sees" them both.

Now, the latter guarantee cannot be provided by hints because we can't 
guarantee hints delivery in face of failures. Typically, if I write hints on a 
node and that node dies in a fire before that hint it delivered, it will never 
be delivered. We need a distributed hint mechanism if you will, and that's what 
the batch log gives us.


> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Reply via email to