[ 
https://issues.apache.org/jira/browse/CASSANDRA-6477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14633656#comment-14633656
 ] 

Benedict commented on CASSANDRA-6477:
-------------------------------------

bq. If you loose any 2 nodes with vnodes you can't achieve quorum anyway. I

You're right, of course. Unfortunately, this only makes the situation worse. I 
guess since, as you say, we are physically incapable of reaching QUORUM for a 
portion, it perhaps doesn't matter if we significantly increase that portion 
for MVs, since there is always a portion for which that property holds. However 
it may be significantly worse, and in fact we will not deliver some MV updates 
to _any_ node with only two nodes failing.

Let's say we have a cluster:

* With N nodes
* With R replication factor
* With 2 failing nodes
* With infinite vnodes (to simplify calculations)
* With F(1) ratio of token ranges overlapping between failed nodes, i.e. that 
cannot reach quorum
* With F(2) ratio of token ranges involving exactly one failed node in the base 
table

Now, when serving a write there are multiple scenarios:

* Of the F(1) writes, only one base node receives an update; 2/N of any update 
generated by this node would be routed to one of the now gone nodes. So NO 
nodes receive ~ 2F(1)/N MV updates.
* Of the F(2) writes, 2/N MV updates will target a dead node
* Of the remaining 1 - F(1) - F(2) writes, F(1) will be incapable of reaching 
QUORUM for the same reason the base table could not

Now, to derive F(1) and F(2) approximately, let's fix some basic cluster 
numbers: N=6, R=3. In this scenario F(1) is somewhere between 1/5 and 1/4. F(2) 
is approximately 4/9.

So, using the lower bound of F(1), we have:
* 1/15 of writes reach no MV node whatsoever
* 4/27 of do not reach QUORUM within the MV because they target a dead node 
(there are two such writes, so ~27% of all writes)
* (16/45)*(1/5)=16/225 cannot reach QUORUM at the MV (there are two such 
writes, so ~17% of all writes)

There are two of each MV update (delete + insert), so we have 27%+17%=44% of 
writes failing to reach quorum, and 13% of writes failing to reach anyone. Vs 
20% of writes we would expect to not reach QUORUM, and 0% of writes to fail to 
reach anyone.

Either way, at the very least we need to ensure we repair our MV portion from 
our base replicas before we exit the JOINING state (or whatever the state is 
where we do not serve reads). Otherwise QUORUM will move backwards in time once 
a node comes back online.

Now, I'll grant I'm very rusty on these maths, so there are no doubt errors 
even in my simplification, but I think they represent the main thrust of the 
problem.

> Materialized Views (was: Global Indexes)
> ----------------------------------------
>
>                 Key: CASSANDRA-6477
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6477
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>            Assignee: Carl Yeksigian
>              Labels: cql
>             Fix For: 3.0 beta 1
>
>         Attachments: test-view-data.sh, users.yaml
>
>
> Local indexes are suitable for low-cardinality data, where spreading the 
> index across the cluster is a Good Thing.  However, for high-cardinality 
> data, local indexes require querying most nodes in the cluster even if only a 
> handful of rows is returned.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to