[jira] [Updated] (CASSANDRA-10993) Make read and write requests paths fully non-blocking, eliminate related stages

Tyler Hobbs (JIRA) Tue, 02 Aug 2016 14:04:22 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Tyler Hobbs updated CASSANDRA-10993:
------------------------------------
    Attachment: tpc-benchmarks-2.txt

I've attached {{tpc-benchmarks-2.txt}} which includes the original benchmark 
numbers in addition to benchmarks on the RxJava approach (CASSANDRA-10528).  I 
also experimented with a few different settings, such as a 2GB newgen and 
setting {{connectionsPerHost=16}} for stress.  Those numbers are included in 
the txt file, but didn't meaningfully change things, so I'll ignore them here.

The benchmark setup is the same as in my previous comment.  To clarify, the 
stress clients are running on m3.xlarge instances, and C* is running on a 
c4.3xlarge.

For 10993, I found the best throughput with the number of netty workers/event 
loop threads set to 14 (num_cores - 2).  For RxJava, using 32 netty workers 
gave the best performance.

All three branches are based on the 
{{007a3f57a328eaace19daca54901ccb70522d89f}} commit from trunk.

Here's a summary of the benchmark results:
* The RxJava branch and 10993 are basically tied at 229k ops/sec.  Trunk gets 
199k ops/sec.
* Latency for RxJava is very similar to trunk.
* Median latency for 10993 is lower than trunk (~16ms vs ~22ms)
* Tail latencies for 10993 are higher than trunk:
** 95th: 71ms vs 47ms on trunk
** 99th: 160ms vs 111ms on trunk
** 99.9th: 435ms vs 225ms on trunk

Obviously, these benchmarks only measure one particular case (reads from the 
memtable), but I think they demonstrate a couple of things:
* Under ideal circumstances (in-memory reads, and probably writes), we can 
expect to see about a 15% improvement in throughput over trunk with changes 
similar to the prototypes.  Further improvements, such as non-concurrent 
memtable data structures, could build on that.
* Performance with the RxJava/reactive-streams approach would be in the same 
ballpark as the state-machine based approach.

Given that the state machine approach and the reactive-streams approach are 
similar in performance, I think we should look closely at the amount of work 
involved in each approach and the merits and challenges of each approach.

In my opinion, going with reactive streams would require less work, and the 
work would generally be less invasive.  There are quite a few nice things in 
RxJava that would need to be implemented by hand with the state machine 
approach:
* Operators/transforms for async results
* Nice ways to await multiple concurrent async operations
* Tracing
* Sophisticated error handling (e.g. 
[Catch/onErrorResumeNext|http://reactivex.io/documentation/operators/catch.html])
* Docs and examples

It's also worth mentioning that reactive streams are being standardized in the 
[Java 9 Flow APIs|https://github.com/ReactiveX/RxJava/wiki/Reactive-Streams], 
and utilizing that should be easy.

On the other hand, some might say that the more explicit state machine approach 
is easier to follow and reason about.  While this is true in simple cases, what 
I've found while working on the 10993 prototype doesn't entirely match this.  
For example, the simplest type of request is probably a single-partition write 
request.  Even this is complicated to represent as a single state machine, 
because the local write path is basically a separate SM that proceeds 
concurrently with the remote write path.  The read path, with complications 
like speculative retry and read repair, is quite a bit more elaborate.  So, I'm 
not sure that we will always be able to present a clear state-machine based 
mental model of operations.

I think the other main argument for the explicit state machine based approach 
is performance.  While there are surely cases where the state machine approach 
(where we have 100% control) would outperform a reactive streams approach, it's 
difficult to predict those and how large the differences might be without fully 
implementing both approaches.  With the amount of time I believe we would save 
by going with reactive streams, we will probably have some extra time to 
mitigate bad edge cases.

With all that said, I'm not 100% sold on going with either approach.  I'd like 
to get some input on this.  [~tjake] [~slebresne] [~iamaleksey] [~jbellis], and 
anybody else who is interested, what do you think?

> Make read and write requests paths fully non-blocking, eliminate related 
> stages
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10993
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10993
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Coordination, Local Write-Read Paths
>            Reporter: Aleksey Yeschenko
>            Assignee: Tyler Hobbs
>             Fix For: 3.x
>
>         Attachments: 10993-reads-no-evloop-integration-six-node-stress.svg, 
> tpc-benchmarks-2.txt, tpc-benchmarks.txt
>
>
> Building on work done by [~tjake] (CASSANDRA-10528), [~slebresne] 
> (CASSANDRA-5239), and others, convert read and write request paths to be 
> fully non-blocking, to enable the eventual transition from SEDA to TPC 
> (CASSANDRA-10989)
> Eliminate {{MUTATION}}, {{COUNTER_MUTATION}}, {{VIEW_MUTATION}}, {{READ}}, 
> and {{READ_REPAIR}} stages, move read and write execution directly to Netty 
> context.
> For lack of decent async I/O options on Linux, we’ll still have to retain an 
> extra thread pool for serving read requests for data not residing in our page 
> cache (CASSANDRA-5863), however.
> Implementation-wise, we only have two options available to us: explicit FSMs 
> and chained futures. Fibers would be the third, and easiest option, but 
> aren’t feasible in Java without resorting to direct bytecode manipulation 
> (ourselves or using [quasar|https://github.com/puniverse/quasar]).
> I have seen 4 implementations bases on chained futures/promises now - three 
> in Java and one in C++ - and I’m not convinced that it’s the optimal (or 
> sane) choice for representing our complex logic - think 2i quorum read 
> requests with timeouts at all levels, read repair (blocking and 
> non-blocking), and speculative retries in the mix, {{SERIAL}} reads and 
> writes.
> I’m currently leaning towards an implementation based on explicit FSMs, and 
> intend to provide a prototype - soonish - for comparison with 
> {{CompletableFuture}}-like variants.
> Either way the transition is a relatively boring straightforward refactoring.
> There are, however, some extension points on both write and read paths that 
> we do not control:
> - authorisation implementations will have to be non-blocking. We have control 
> over built-in ones, but for any custom implementation we will have to execute 
> them in a separate thread pool
> - 2i hooks on the write path will need to be non-blocking
> - any trigger implementations will not be allowed to block
> - UDFs and UDAs
> We are further limited by API compatibility restrictions in the 3.x line, 
> forbidding us to alter, or add any non-{{default}} interface methods to those 
> extension points, so these pose a problem.
> Depending on logistics, expecting to get this done in time for 3.4 or 3.6 
> feature release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (CASSANDRA-10993) Make read and write requests paths fully non-blocking, eliminate related stages

Reply via email to