[
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631891#comment-14631891
]
Benedict commented on CASSANDRA-6477:
-------------------------------------
OK, so I think we have a lot of competing goals that are not being discussed in
a clearly delineated fashion:
* How do we ensure the MV is not corrupted
* How do we maintain AP
* How do we best honour user consistency level
* How do we do it quickly?
Unfortunately, goals (1) and (2) are directly opposed to each other. I hadn't
originally envisaged having 20+ MV, but if that's on the cards, (1) really must
reign supreme. Otherwise we will end up with a {{CL.LITERALLY_EVERYONE}}
scenario.
The problem here is that the coordinator-level batch log is almost useless, and
in fact I think *absolutely guaranteeing* no-MV corruption is nigh impossible.
If the base table update on a replica has been updated, but its paired MV has
not, it doesn't matter what the coordinator does with replay, as the base
replica will not apply any delta.
So, I'm pretty sure we can drop the coordinator batch log. What I propose
instead, is to always ensure an owner is the coordinator: if a non-owner
coordinates, it just proxies it on to one (or more, if a response is slow)
owning coordinators.
_Loosely* speaking, it can then:
# write the mutation to the other base-replicas;
# perform its local read-before-write, and write to the local batchlog the
total set of MV deltas it will apply (along with the base mutation)
# write to its paired replica for one of its MVs both the MV mutation _and_ the
whole batchlog of mutations, including mutations for the other base replicas
# once _any_ of these respond (the other base replicas, or the MV replica),
we're GTG as we have confidence that two or more base replicas will receive the
mutation, and so we write to the remaining MV replicas
The non-coordinator base-replicas just perform all of their work at once: they
write their deltas (and self mutation) to the _local_ batchlog, then write this
batchlog to one of their MV replicas simultaneously to sending _all_ of their
updates. This is safe because we already know the coordinator has written this
to their batchlog, and so our doing so is enough to reach QUORUM for commit.
The main advantage of this is that we have no synchronous operations, but we
still _reasonably_ guarantee we eventually reach consistency - at least as well
as we do currently (I'm not 100% familiar on how we write to the local batch
log currently, but ensuring we write deltas at-once for all MVs is critical to
correctness). The reason we do not need to perform any synchronous batchlog
writes is that if, for whatever reason, we lose all of the nodes involved, then
it does not matter that the batchlog records never made it: nor can the base
mutations, nor any follow-on MV mutations. The slate is wiped clean.
I have to say, though: I'm more than a little worried about how repair factors
into this equation. I kind of suspect we need to guarantee all of these logs
are empty before we run repair, or we need to produce deltas on receipt of
repaired data. This is true of all of the outlined approaches.
> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
> Key: CASSANDRA-6477
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
> Project: Cassandra
> Issue Type: New Feature
> Components: API, Core
> Reporter: Jonathan Ellis
> Assignee: Carl Yeksigian
> Labels: cql
> Fix For: 3.0 beta 1
>
> Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the
> index across the cluster is a Good Thing. However, for high-cardinality
> data, local indexes require querying most nodes in the cluster even if only a
> handful of rows is returned.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)