[ 
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194659#comment-14194659
 ] 

Blake Eggleston commented on CASSANDRA-6246:
--------------------------------------------

I have an initial implementation here: 
https://github.com/bdeggleston/cassandra/compare/CASSANDRA-6246?expand=1

It’s still pretty rough, I just wanted to get it to a point where we could get 
a feel for the performance advantages and decide if the additional complexity 
was worth it. There’s also none of the instance gc / optimized failure recovery 
we’ve been talking about.

I did some performance comparisons over the weekend. The tldr is that epaxos is 
10% to 11.5x faster than classic paxos, depending on the workload.

To test, I used a cluster of 3 m3.xlarge instances in us-east, and a 4th 
instance executing queries against the cluster. Each C* node was in a different 
az. Commit log and data directories were on different disks.

There were 2 tests, each running 10k queries against the cluster. The first 
test measured throughput using queries that wouldn’t contend with each other. 
Each query inserted a row for a different partition. The second test measured 
performance under contention, where every query contended for the same 
partition. 

Each test was run with 1, 5, & 10 concurrent client requests.

With the uncontended workload, epaxos request time is 10-14% faster than the 
current implementation on average.
See: 
https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=0

With the contended workload, epaxos request time is 4.5x-11.5x faster than the 
current implementation on average.
See: 
https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=1327463955

There are 2 epaxos sections, regular, and cached. With higher contended request 
concurrency, the execution algorithm has to visit a lot of unexecuted instances 
to build it’s dependency graph. Reading the dependency data and instances out 
of their tables and deserializing them for each visit slows down epaxos to a 
point where it’s over twice as slow as classic paxos. By using a guava cache 
for the instance and dependency data objects, and keeping them around for a few 
minutes, epaxos is ~30x faster in higher contention/concurrency situations.

Some notes on the concurrent contended tests:

* The median query time for epaxos is a little slower than classic paxos for 5 
concurrent contended requests. This is because epaxos is now doing an accept 
phase on a lot of the queries, and because classic paxos doesn’t send commit 
messages out if the predicate doesn’t apply to the query.
* With concurrent contending queries, 1-2.5% of the classic paxos queries 
timeout and fail. At this level, there are no failing epaxos queries. 
* Variance in query times is also much lower with epaxos. With 10 concurrent 
contending requests, the 95th %ile request time for classic paxos is 23x the 
median, epaxos is 1.8x.


> EPaxos
> ------
>
>                 Key: CASSANDRA-6246
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Blake Eggleston
>            Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is 
> that Multi-paxos requires leader election and hence, a period of 
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, 
> (2) is particularly useful across multiple datacenters, and (3) allows any 
> node to act as coordinator: 
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to 
> implement it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to