[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Benedict (JIRA) Fri, 17 Jul 2015 14:00:25 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14631891#comment-14631891
 ]


Benedict commented on CASSANDRA-6477:
-------------------------------------

OK, so I think we have a lot of competing goals that are not being discussed in 
a clearly delineated fashion:

* How do we ensure the MV is not corrupted
* How do we maintain AP
* How do we best honour user consistency level
* How do we do it quickly?

Unfortunately, goals (1) and (2) are directly opposed to each other. I hadn't 
originally envisaged having 20+ MV, but if that's on the cards, (1) really must 
reign supreme. Otherwise we will end up with a {{CL.LITERALLY_EVERYONE}} 
scenario.

The problem here is that the coordinator-level batch log is almost useless, and 
in fact I think *absolutely guaranteeing* no-MV corruption is nigh impossible. 
If the base table update on a replica has been updated, but its paired MV has 
not, it doesn't matter what the coordinator does with replay, as the base 
replica will not apply any delta.

So, I'm pretty sure we can drop the coordinator batch log. What I propose 
instead, is to always ensure an owner is the coordinator: if a non-owner 
coordinates, it just proxies it on to one (or more, if a response is slow) 
owning coordinators. 

_Loosely* speaking, it can then:

# write the mutation to the other base-replicas;
# perform its local read-before-write, and write to the local batchlog the 
total set of MV deltas it will apply (along with the base mutation)
# write to its paired replica for one of its MVs both the MV mutation _and_ the 
whole batchlog of mutations, including mutations for the other base replicas
# once _any_ of these respond (the other base replicas, or the MV replica), 
we're GTG as we have confidence that two or more base replicas will receive the 
mutation, and so we write to the remaining MV replicas

The non-coordinator base-replicas just perform all of their work at once: they 
write their deltas (and self mutation) to the _local_ batchlog, then write this 
batchlog to one of their MV replicas simultaneously to sending _all_ of their 
updates. This is safe because we already know the coordinator has written this 
to their batchlog, and so our doing so is enough to reach QUORUM for commit.

The main advantage of this is that we have no synchronous operations, but we 
still _reasonably_ guarantee we eventually reach consistency - at least as well 
as we do currently (I'm not 100% familiar on how we write to the local batch 
log currently, but ensuring we write deltas at-once for all MVs is critical to 
correctness). The reason we do not need to perform any synchronous batchlog 
writes is that if, for whatever reason, we lose all of the nodes involved, then 
it does not matter that the batchlog records never made it: nor can the base 
mutations, nor any follow-on MV mutations. The slate is wiped clean.

I have to say, though: I'm more than a little worried about how repair factors 
into this equation. I kind of suspect we need to guarantee all of these logs 
are empty before we run repair, or we need to produce deltas on receipt of 
repaired data. This is true of all of the outlined approaches.

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6477) Materialized Views (was: Global Indexes)

Reply via email to