[jira] [Comment Edited] (CASSANDRA-10230) Remove coordinator batchlog from materialized views

Joel Knighton (JIRA) Mon, 14 Sep 2015 13:37:45 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743721#comment-14743721
 ]


Joel Knighton edited comment on CASSANDRA-10230 at 9/14/15 8:35 PM:
--------------------------------------------------------------------

I'm just now finishing up some tests for this.

The test methodology is as follows:
1.  Tune the failure detector so that network partitions for the durations in 
the test will not cause nodes to realize the cluster is partitioned.
2.  Partition the cluster so nodes 1 and 2 can communicate and nodes 3 and 4 
can communicate.
3.  Write to the cluster at CL.ONE.
4. Heal the partition for three seconds, completely partition the cluster, 
perform a read with a client connected to each node as coordinator, so that we 
know reads only come from the coordinator.

Currently, testing shows that both hints and batchlogs successfully propagate 
all writes to all replicas of the base tables without data loss. If both hints 
and batchlogs are disabled, writes will never propagate (since read repair is 
impossible).

Graphs showing a rough estimate of propagation time (time from read on final 
replica - time from read on first replica) will be uploaded shortly, along with 
a link to the Jepsen test.

EDIT: Because of how the tests work, the graphs showing the relative latency of 
these approaches doesn't provide much information, only that they converge in 
at most a couple hundred seconds each.

The test is available 
[here|https://github.com/riptano/jepsen/blob/c07b54041223f9836fc9c359239a1622b64b3415/cassandra/test/cassandra/mv_test.clj#L61]


was (Author: jkni):
I'm just now finishing up some tests for this.

The test methodology is as follows:
1.  Tune the failure detector so that network partitions for the durations in 
the test will not cause nodes to realize the cluster is partitioned.
2.  Partition the cluster so nodes 1 and 2 can communicate and nodes 3 and 4 
can communicate.
3.  Write to the cluster at CL.ONE.
4. Heal the partition for three seconds, completely partition the cluster, 
perform a read with a client connected to each node as coordinator, so that we 
know reads only come from the coordinator.

Currently, testing shows that both hints and batchlogs successfully propagate 
all writes to all replicas of the base tables without data loss. If both hints 
and batchlogs are disabled, writes will never propagate (since read repair is 
impossible).

Graphs showing a rough estimate of propagation time (time from read on final 
replica - time from read on first replica) will be uploaded shortly, along with 
a link to the Jepsen test.

> Remove coordinator batchlog from materialized views
> ---------------------------------------------------
>
>                 Key: CASSANDRA-10230
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10230
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: T Jake Luciani
>            Assignee: Joel Knighton
>             Fix For: 3.0.0 rc1
>
>
> We are considering removing or making optional the coordinator batchlog.  
> The batchlog primary serves as a way to quickly reach consistency between 
> base and view since we don't have any kind of read repair between base and 
> view. But we do have repair so as long as you don't lose nodes while writing 
> at CL.ONE you will be eventually consistent.
> I've committed to the 3.0 branch a way to disable the coordinator with 
> {{-Dcassandra.mv_disable_coordinator_batchlog=true}}
> The majority of the performance hit to throughput is currently the batchlog 
> as shown by this chart.
> http://cstar.datastax.com/graph?stats=f794245a-4d9d-11e5-9def-42010af0688f&metric=op_rate&operation=1_user&smoothing=1&show_aggregates=true&xmin=0&xmax=498.52&ymin=0&ymax=50142.4
> I'd like to have tests run with and without this flag to validate how quickly 
> we achieve quorum consistency without repair writing with CL.ONE.   Once we 
> can see there is little/no impact we can permanently remove the coordinator 
> batchlog.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10230) Remove coordinator batchlog from materialized views

Reply via email to