[
https://issues.apache.org/jira/browse/CASSANDRA-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382056#comment-16382056
]
Kenneth Brotman commented on CASSANDRA-14268:
---------------------------------------------
>From DataStax at
>[https://www.datastax.com/dev/blog/understanding-materialized-views]
h1. Understanding the Guarantees, Limitations, and Tradeoffs of Cassandra and
Materialized Views
By [Jake Luciani|https://www.datastax.com/author/jakedatastax-com] - February
12, 2016 | [2
Comments|https://www.datastax.com/dev/blog/understanding-materialized-views#comments]
The new [Materialized Views
feature|https://www.datastax.com/dev/blog/new-in-cassandra-3-0-materialized-views]
in Cassandra 3.0 offers an easy way to accurately denormalize data so it can
be efficiently queried. It's meant to be used on high cardinality columns
where the use of [secondary indexes is not
efficient|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useWhenIndex.html]
due to fan-out across all nodes. An example would be creating a secondary
index on a user_id. As the number of users in the system grows the longer it
would take a secondary index to locate the data since secondary indexes store
data locally. With a materialized view you can *partition* the data on user_id
so finding a specific user becomes a direct lookup with the added benefit of
holding other denormalized data from the base table along with it, similar to a
[DynamoDB global secondary
index|http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/GSI.html].
Materialized views are a very useful feature to have in Cassandra but before
you go jumping in head first, it helps to understand how this feature was
designed and what the guarantees are.
Primarily, since materialized views live in Cassandra they can offer at most
what Cassandra offers, namely a highly available, eventually consistent version
of materialized views.
A quick refresher of the Cassandra guarantees and tradeoffs:
C* Guarantees:
* Writes to a single table are guaranteed to be eventually consistent across
replicas - meaning divergent versions of a row will be reconciled and reach the
same end state.
* [Lightweight
transactions|http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlLtwtTransactions.html]
are guaranteed to be linearizable for table writes within a data center or
globally depending on the use of [LOCAL_SERIAL vs
SERIAL|http://docs.datastax.com/en/cassandra/3.x/cassandra/dml/dmlConfigSerialConsistency.html]
consistency level respectively.
*
+[Batched|http://docs.datastax.com/en/cql/3.0/cql/cql_reference/batch_r.html]+
writes across multiple tables are guaranteed to succeed completely or not at
all (by using a durable log).
* [Secondary indexes
|http://docs.datastax.com/en/cql/3.3/cql/cql_using/useSecondaryIndex.html](once
built) are guaranteed to be consistent with their local replicas data.
C* Limitations:
* Cassandra provides read uncommitted isolation by default. (Lightweight
transactions provide linearizable isolation)
C* Tradeoffs:
* Using lower consistency levels yield higher availability and better latency
at the price of weaker consistency.
* Using higher consistency levels yield lower availability and higher request
latency with the benefit of stronger consistency.
Another tradeoff to consider is how Cassandra deals with data safety in the
face of hardware failures. Say your disk dies or your datacenter has a fire
and you lose machines; how safe is your data? Well, it depends on a few
factors, mainly replication factor and consistency level used for the write.
With consistency level QUORUM and RF=3 your data is safe on at least two nodes
so if you lose one node you still have a copy. However, if you only have RF=1
and lose a node forever you've lost data forever.
An extreme example of this is if you have RF=3 but write at CL.ONE and the
write only succeeds on a single node, followed directly by the death of that
node. Unless the coordinator was a different node you probably just lost data.
Given Cassandra's system properties, the implication of maintaining
Materialized Views +manually+ in your application is likely to create permanent
inconsistencies between views. Since your application will need to read the
existing state from Cassandra then modify the views to clean-up any updates
existing rows. Besides the added latency, if there are other updates going to
the same rows your reads will end up in a race condition and fail to clean up
all the state changes. This is the scenario the [mvbench
tool|https://github.com/tjake/mvbench] compares against.
The Materialized Views feature in Cassandra 3.0 was written to address these
and other complexities surrounding manual denormalization, but that is not to
say it's not without its own set of guarantees and tradeoffs to consider. To
understand the internal design of Materialized Views please read the [design
document|https://docs.google.com/document/d/1sK96wsE3uwFqzrLQju_spya6rOTxojOKR9N7W-rPwdw/edit?usp=sharing].
At a high level though we chose correctness over raw performance for writes,
but did our best to avoid needless write amplification. A simple way to think
about this write amplification problem is: if I have a *base* table with RF=3
and a *view* table with RF=3 a naive approach would send a write to each base
replica and each base replica would send a view update to each view replica;
RF+RF^2 writes per-mutation! C* Materialized Views instead pairs each base
replica with a *single* view replica. This simplifies to be RF+RF writes per
mutation while still guaranteeing convergence.
Materialized View Guarantees:
* All changes to the base table will be eventually reflected in the view
tables unless there is a total data loss in the base table (as described in the
previous section)
Materialized View Limitations:
* All updates to the view happen asynchronously unless corresponding view
replica is the same node. We must do this to ensure availability is not
compromised. It's easy to imagine a worst case scenario of 10 Materialized
Views for which each update to the base table requires writing to 10 separate
nodes. Under normal operation views will see the data quickly and there are new
metrics to track it (ViewWriteMetricss).
* There is no read repair between the views and the base table. Meaning a
read repair on the view will only correct that view's data not the base table's
data. If you are reading from the base table though, read repair _will_ send
updates to the base and the view.
* Mutations on a base table +partition+ must happen sequentially per replica
if the mutation touches a column in a view (this will improve after ticket
CASSANDRA-10307)
Materialized View Tradeoffs:
* With materialized views you are trading performance for correctness. It
takes more work to ensure the views will see all the state changes to a given
row. Local locks and local reads required. If you don't need consistency or
never update/delete data you can bypass materialized views and simply write to
many tables from your client. There is also a ticket CASSANDRA-9779 that will
offer a way to bypass the performance hit in the case of insert only workloads.
* The data loss scenario described in the section above (there exists only a
single copy on a single node that dies) has different effects depending on if
the base or view was affected. If view data was lost from all replicas you
would need to drop and re-create the view. If the base table lost data
through, there would be an inconsistency between the base and the view with the
view having data the base doesn't. Currently, there is no way to fix the base
from the view; ticket CASSANDRA-10346 was added to address this.
One final point on repair. As described in the [design
document|https://docs.google.com/document/d/1sK96wsE3uwFqzrLQju_spya6rOTxojOKR9N7W-rPwdw/edit#heading=h.c88v9p3byo75],
repairs mean different things depending on if you are repairing the base or
the view. If you repair *only* the view you will see a consistent state across
the view replicas (not the base). If you repair the base you will repair
*both* the base and the view. This is accomplished by passing streamed base
data through the regular write path, which in turn updates the views. This
mode is also how bootstrapping new nodes and SSTable loading works as well to
provide consistent materialized views.
[Facebook|https://www.datastax.com/#facebook]
[Twitter|https://www.datastax.com/#twitter]
[Google+|https://www.datastax.com/#google_plus]
> The Architecture:Guarantees web page is empty
> ---------------------------------------------
>
> Key: CASSANDRA-14268
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14268
> Project: Cassandra
> Issue Type: Improvement
> Components: Documentation and Website
> Reporter: Kenneth Brotman
> Priority: Major
>
> [http://cassandra.apache.org/doc/latest/architecture/guarantees.html]
> Please submit content and myself or someone else will take it from there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]