[
https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194659#comment-14194659
]
Blake Eggleston commented on CASSANDRA-6246:
--------------------------------------------
I have an initial implementation here:
https://github.com/bdeggleston/cassandra/compare/CASSANDRA-6246?expand=1
It’s still pretty rough, I just wanted to get it to a point where we could get
a feel for the performance advantages and decide if the additional complexity
was worth it. There’s also none of the instance gc / optimized failure recovery
we’ve been talking about.
I did some performance comparisons over the weekend. The tldr is that epaxos is
10% to 11.5x faster than classic paxos, depending on the workload.
To test, I used a cluster of 3 m3.xlarge instances in us-east, and a 4th
instance executing queries against the cluster. Each C* node was in a different
az. Commit log and data directories were on different disks.
There were 2 tests, each running 10k queries against the cluster. The first
test measured throughput using queries that wouldn’t contend with each other.
Each query inserted a row for a different partition. The second test measured
performance under contention, where every query contended for the same
partition.
Each test was run with 1, 5, & 10 concurrent client requests.
With the uncontended workload, epaxos request time is 10-14% faster than the
current implementation on average.
See:
https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=0
With the contended workload, epaxos request time is 4.5x-11.5x faster than the
current implementation on average.
See:
https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=1327463955
There are 2 epaxos sections, regular, and cached. With higher contended request
concurrency, the execution algorithm has to visit a lot of unexecuted instances
to build it’s dependency graph. Reading the dependency data and instances out
of their tables and deserializing them for each visit slows down epaxos to a
point where it’s over twice as slow as classic paxos. By using a guava cache
for the instance and dependency data objects, and keeping them around for a few
minutes, epaxos is ~30x faster in higher contention/concurrency situations.
Some notes on the concurrent contended tests:
* The median query time for epaxos is a little slower than classic paxos for 5
concurrent contended requests. This is because epaxos is now doing an accept
phase on a lot of the queries, and because classic paxos doesn’t send commit
messages out if the predicate doesn’t apply to the query.
* With concurrent contending queries, 1-2.5% of the classic paxos queries
timeout and fail. At this level, there are no failing epaxos queries.
* Variance in query times is also much lower with epaxos. With 10 concurrent
contending requests, the 95th %ile request time for classic paxos is 23x the
median, epaxos is 1.8x.
> EPaxos
> ------
>
> Key: CASSANDRA-6246
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6246
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Blake Eggleston
> Priority: Minor
>
> One reason we haven't optimized our Paxos implementation with Multi-paxos is
> that Multi-paxos requires leader election and hence, a period of
> unavailability when the leader dies.
> EPaxos is a Paxos variant that requires (1) less messages than multi-paxos,
> (2) is particularly useful across multiple datacenters, and (3) allows any
> node to act as coordinator:
> http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf
> However, there is substantial additional complexity involved if we choose to
> implement it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)