[jira] [Commented] (CASSANDRA-14268) The Architecture:Guarantees web page is empty

Kenneth Brotman (JIRA) Thu, 01 Mar 2018 06:25:29 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382056#comment-16382056
 ]


Kenneth Brotman commented on CASSANDRA-14268:
---------------------------------------------

>From DataStax at 
>[https://www.datastax.com/dev/blog/understanding-materialized-views]
h1. Understanding the Guarantees, Limitations, and Tradeoffs of Cassandra and 
Materialized Views

By [Jake Luciani|https://www.datastax.com/author/jakedatastax-com] -  February 
12, 2016 | [2 
Comments|https://www.datastax.com/dev/blog/understanding-materialized-views#comments]

The new [Materialized Views 
feature|https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views]
 in Cassandra 3.0 offers an easy way to accurately denormalize data so it can 
be efficiently queried.   It's meant to be used on high cardinality columns 
where the use of [secondary indexes is not 
efficient|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html] 
due to fan-out across all nodes.  An example would be creating a secondary 
index on a user_id.  As the number of users in the system grows the longer it 
would take a secondary index to locate the data since secondary indexes store 
data locally.  With a materialized view you can *partition* the data on user_id 
so finding a specific user becomes a direct lookup with the added benefit of 
holding other denormalized data from the base table along with it, similar to a 
[DynamoDB global secondary 
index|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html].

Materialized views are a very useful feature to have in Cassandra but before 
you go jumping in head first, it helps to understand how this feature was 
designed and what the guarantees are.

Primarily, since materialized views live in Cassandra they can offer at most 
what Cassandra offers, namely a highly available, eventually consistent version 
of materialized views.

A quick refresher of the Cassandra guarantees and tradeoffs:

C* Guarantees:
 * Writes to a single table are guaranteed to be eventually consistent across 
replicas - meaning divergent versions of a row will be reconciled and reach the 
same end state.
 * [Lightweight 
transactions|http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlLtwtTransactions.html]
 are guaranteed to be linearizable for table writes within a data center or 
globally depending on the use of [LOCAL_SERIAL vs 
SERIAL|http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlConfigSerialConsistency.html]
 consistency level respectively.
 * 
+[Batched|http://docs.datastax.com/en/cql/3.0/cql/cql_reference/batch_r.html]+ 
writes across multiple tables are guaranteed to succeed completely or not at 
all (by using a durable log).
 * [Secondary indexes 
|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSecondaryIndex.html](once 
built) are guaranteed to be consistent with their local replicas data.

C* Limitations:
 * Cassandra provides read uncommitted isolation by default.  (Lightweight 
transactions provide linearizable isolation)

C* Tradeoffs:
 * Using lower consistency levels yield higher availability and better latency 
at the price of weaker consistency.
 * Using higher consistency levels yield lower availability and higher request 
latency with the benefit of stronger consistency.

Another tradeoff to consider is how Cassandra deals with data safety in the 
face of hardware failures.  Say your disk dies or your datacenter has a fire 
and you lose machines; how safe is your data?  Well, it depends on a few 
factors, mainly replication factor and consistency level used for the write.  
With consistency level QUORUM and RF=3 your data is safe on at least two nodes 
so if you lose one node you still have a copy.  However, if you only have RF=1 
and lose a node forever you've lost data forever.

An extreme example of this is if you have RF=3 but write at CL.ONE and the 
write only succeeds on a single node, followed directly by the death of that 
node.  Unless the coordinator was a different node you probably just lost data.

 

Given Cassandra's system properties, the implication of maintaining 
Materialized Views +manually+ in your application is likely to create permanent 
inconsistencies between views.  Since your application will need to read the 
existing state from Cassandra then modify the views to clean-up any updates 
existing rows.  Besides the added latency, if there are other updates going to 
the same rows your reads will end up in a race condition and fail to clean up 
all the state changes.  This is the scenario the [mvbench 
tool|https://github.com/tjake/mvbench] compares against.

The Materialized Views feature in Cassandra 3.0 was written to address these 
and other complexities surrounding manual denormalization, but that is not to 
say it's not without its own set of guarantees and tradeoffs to consider. To 
understand the internal design of Materialized Views please read the [design 
document|https://docs.google.com/document/d/1sK96wsE3uwFqzrLQju_spya6rOTxojOKR9N7W-rPwdw/edit?usp=sharing].
  At a high level though we chose correctness over raw performance for writes, 
but did our best to avoid needless write amplification.  A simple way to think 
about this write amplification problem is: if I have a *base* table with RF=3 
and a *view* table with RF=3 a naive approach would send a write to each base 
replica and each base replica would send a view update to each view replica; 
RF+RF^2 writes per-mutation!  C* Materialized Views instead pairs each base 
replica with a *single* view replica. This simplifies to be RF+RF writes per 
mutation while still guaranteeing convergence.

Materialized View Guarantees:
 * All changes to the base table will be eventually reflected in the view 
tables unless there is a total data loss in the base table (as described in the 
previous section)

Materialized View Limitations:
 * All updates to the view happen asynchronously unless corresponding view 
replica is the same node.  We must do this to ensure availability is not 
compromised.  It's easy to imagine a worst case scenario of 10 Materialized 
Views for which each update to the base table requires writing to 10 separate 
nodes. Under normal operation views will see the data quickly and there are new 
metrics to track it (ViewWriteMetricss).
 * There is no read repair between the views and the base table.  Meaning a 
read repair on the view will only correct that view's data not the base table's 
data.  If you are reading from the base table though, read repair _will_ send 
updates to the base and the view.
 * Mutations on a base table +partition+ must happen sequentially per replica 
if the mutation touches a column in a view (this will improve after ticket 
CASSANDRA-10307)

Materialized View Tradeoffs:
 * With materialized views you are trading performance for correctness. It 
takes more work to ensure the views will see all the state changes to a given 
row.  Local locks and local reads required.  If you don't need consistency or 
never update/delete data you can bypass materialized views and simply write to 
many tables from your client.  There is also a ticket CASSANDRA-9779 that will 
offer a way to bypass the performance hit in the case of insert only workloads.
 * The data loss scenario described in the section above (there exists only a 
single copy on a single node that dies) has different effects depending on if 
the base or view was affected.  If view data was lost from all replicas you 
would need to drop and re-create the view.  If the base table lost data 
through, there would be an inconsistency between the base and the view with the 
view having data the base doesn't.  Currently, there is no way to fix the base 
from the view; ticket CASSANDRA-10346 was added to address this.

One final point on repair. As described in the [design 
document|https://docs.google.com/document/d/1sK96wsE3uwFqzrLQju_spya6rOTxojOKR9N7W-rPwdw/edit#heading=h.c88v9p3byo75],
 repairs mean different things depending on if you are repairing the base or 
the view.  If you repair *only* the view you will see a consistent state across 
the view replicas (not the base).  If you repair the base you will repair 
*both* the base and the view.  This is accomplished by passing streamed base 
data through the regular write path, which in turn updates the views.  This 
mode is also how bootstrapping new nodes and SSTable loading works as well to 
provide consistent materialized views.

 


 
[Facebook|https://www.datastax.com/#facebook] 
[Twitter|https://www.datastax.com/#twitter] 
[Google+|https://www.datastax.com/#google_plus]

> The Architecture:Guarantees web page is empty
> ---------------------------------------------
>
>                 Key: CASSANDRA-14268
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14268
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation and Website
>            Reporter: Kenneth Brotman
>            Priority: Major
>
> [http://cassandra.apache.org/doc/latest/architecture/guarantees.html]
> Please submit content and myself or someone else will take it from there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-14268) The Architecture:Guarantees web page is empty

Reply via email to