[
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14630910#comment-14630910
]
Sylvain Lebresne commented on CASSANDRA-6477:
---------------------------------------------
bq. Why do we need this at all? Since replicas are in charge of updating MV
then normal hints should perform the same function as batchlog except without
the performance hint in the normal case.
Allow me to sum up how we deal with consistency guarantees, why we do it this
way and why I don't think hints work. I'm sorry if that response is a bit
verbose but as this is the most important thing of this ticket imo, I think it
bears repeating and making sure we're all on the same page.
The main guarantee we have to provide here is that MV are eventually consistent
with their base table. In other words, whatever failure scenarios we run into,
we should never have an inconsistency that never gets resolved. The canonical
example of why this is not a given is we have a column {{c = 2}} in the base
table that is also in a MV PK, and we have 2 concurrent updates A (sets {{c =
3}}) and B (sets {{c = 4}}). Without any kind of protection, we could end up
with the MV permanently having 2 entries, one of A and one for B, which is
incorrect (which should eventually converge to the update that has the biggest
timestamp since that's what the base table will keep). To the best of my
knowledge, there is 2 fundamental components to avoiding such permanent
inconsistency in the currently written patch/approach:
# On each replica, we synchronize/serialize the read-before-write done on the
base table. This guarantees that we won't have A and B racing on a single
base-table replica. Or, in other words, *if* the same replica sees both update
(where "sees" means "do the read-before-write-and-update-MV-accordingly"
dance), then it will properly update the MV. And since each base-table replica
updates all MV-table replica, it's enough that a single base-table replica sees
both update to guarantee eventually consistent of the MV. But we do need to
guarantee _at least_ one such base-table replica sees both updates and that's
the 2nd component.
# To provided that latter guarantee, we first put each base-table update that
include MV updates in the batchlog on the coordinator, and we only remove it
from the batchlog once a _QUORUM_ of replica have aknowledged the write (this
is importantly not dependent of the CL, eventual consistency must be guaranteed
whatever CL you use). That guarantees us that until a QUORUM of replica have
seen the update, we'll keep replaying it, which in turns guarantees us that for
any 2 updates, at least one replica will have "sees" them both.
Now, the latter guarantee cannot be provided by hints because we can't
guarantee hints delivery in face of failures. Typically, if I write hints on a
node and that node dies in a fire before that hint it delivered, it will never
be delivered. We need a distributed hint mechanism if you will, and that's what
the batch log gives us.
> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Assignee: Carl Yeksigian
> Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the
> index across the cluster is a Good Thing. However, for high-cardinality
> data, local indexes require querying most nodes in the cluster even if only a
> handful of rows is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)