[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633413#comment-14633413
 ] 

Benedict edited comment on CASSANDRA-6477 at 7/20/15 10:53 AM:
---------------------------------------------------------------

I must admit, I'm becoming less and less convinced by the idea (admittedly my 
own) of proxying on to only one node. From an availability perspective, it 
seems very likely to induce a persistent mismatch between the base and MV 
replicas. We only need two node failures anywhere in the cluster, and we pretty 
much guarantee that portions of the base table and the MV begin to diverge at 
QUORUM with vnodes (even without vnodes, the chance is 1/RF). 

Since every node is a base replica and an MV replica, with vnodes we are likely 
to be replicating portions of our share of the base table on to every other 
node in the cluster as an MV replica: 
* for any single token range we will have picked a single node A to share with;
* however, we cannot be certain that for every token range that replicates to A 
we will select A, as this would require a great deal of cooperation and forward 
planning (if it is even possible at all; I'm not sure we can guarantee it 
without incorporating it into a token allocation strategy, but I haven't 
thought about it extensively)
* as such, for any node we must assume it replicates to vnode distinct nodes 
(actually a little fewer, ala birthday paradox, but for simplicity let's assume 
the worst), which typically will mean the whole cluster
* so, if we lose any two nodes, one of those nodes will be a base replica, and 
the other will be an MV replica _that is paired with one of the other base 
replicas_ for one of the token ranges in the cluster
* so, we will reach QUORUM for the base replica, but not for the affected MV 
replica token range

edit: this is because the base replica that's failed obviously won't proxy on 
the operation, but the failed MV replica will obviously fail to receive it; 
they're disjoint, so we only have one MV replica receiving the data. One more 
failure in the cluster (of vnodes) and I think we're actually pretty darn 
likely (I haven't thought the maths through exactly) to have around {{1/RF^2}} 
of the token ranges fail to receive _any_ of their updates, despite reaching 
QUORUM on the base table.


was (Author: benedict):
I must admit, I'm becoming less and less convinced by the idea (admittedly my 
own) of proxying on to only one node. From an availability perspective, it 
seems very likely to induce a persistent mismatch between the base and MV 
replicas. We only need two node failures anywhere in the cluster, and we pretty 
much guarantee that portions of the base table and the MV begin to diverge at 
QUORUM with vnodes (even without vnodes, the chance is 1/RF). 

Since every node is a base replica and an MV replica, with vnodes we are likely 
to be replicating portions of our share of the base table on to every other 
node in the cluster as an MV replica: 
* for any single token range we will have picked a single node A to share with;
* however, we cannot be certain that for every token range that replicates to A 
we will select A, as this would require a great deal of cooperation and forward 
planning (if it is even possible at all; I'm not sure we can guarantee it 
without incorporating it into a token allocation strategy, but I haven't 
thought about it extensively)
* as such, for any node we must assume it replicates to vnode distinct nodes 
(actually a little fewer, ala birthday paradox, but for simplicity let's assume 
the worst), which typically will mean the whole cluster
* so, if we lose any two nodes, one of those nodes will be a base replica, and 
the other will be an MV replica _that is paired with one of the other base 
replicas_ for one of the token ranges in the cluster
* so, we will reach QUORUM for the base replica, but not for the affected MV 
replica token range



> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to