[ https://issues.apache.org/jira/browse/CASSANDRA-6246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14194659#comment-14194659 ]
Blake Eggleston commented on CASSANDRA-6246: -------------------------------------------- I have an initial implementation here: https://github.com/bdeggleston/cassandra/compare/CASSANDRA-6246?expand=1 It’s still pretty rough, I just wanted to get it to a point where we could get a feel for the performance advantages and decide if the additional complexity was worth it. There’s also none of the instance gc / optimized failure recovery we’ve been talking about. I did some performance comparisons over the weekend. The tldr is that epaxos is 10% to 11.5x faster than classic paxos, depending on the workload. To test, I used a cluster of 3 m3.xlarge instances in us-east, and a 4th instance executing queries against the cluster. Each C* node was in a different az. Commit log and data directories were on different disks. There were 2 tests, each running 10k queries against the cluster. The first test measured throughput using queries that wouldn’t contend with each other. Each query inserted a row for a different partition. The second test measured performance under contention, where every query contended for the same partition. Each test was run with 1, 5, & 10 concurrent client requests. With the uncontended workload, epaxos request time is 10-14% faster than the current implementation on average. See: https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=0 With the contended workload, epaxos request time is 4.5x-11.5x faster than the current implementation on average. See: https://docs.google.com/spreadsheets/d/1olMYCepsE_02bMyfzV0Hke5UKuqoCNNjSIjR9yNs5iI/edit?pli=1#gid=1327463955 There are 2 epaxos sections, regular, and cached. With higher contended request concurrency, the execution algorithm has to visit a lot of unexecuted instances to build it’s dependency graph. Reading the dependency data and instances out of their tables and deserializing them for each visit slows down epaxos to a point where it’s over twice as slow as classic paxos. By using a guava cache for the instance and dependency data objects, and keeping them around for a few minutes, epaxos is ~30x faster in higher contention/concurrency situations. Some notes on the concurrent contended tests: * The median query time for epaxos is a little slower than classic paxos for 5 concurrent contended requests. This is because epaxos is now doing an accept phase on a lot of the queries, and because classic paxos doesn’t send commit messages out if the predicate doesn’t apply to the query. * With concurrent contending queries, 1-2.5% of the classic paxos queries timeout and fail. At this level, there are no failing epaxos queries. * Variance in query times is also much lower with epaxos. With 10 concurrent contending requests, the 95th %ile request time for classic paxos is 23x the median, epaxos is 1.8x. > EPaxos > ------ > > Key: CASSANDRA-6246 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6246 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Blake Eggleston > Priority: Minor > > One reason we haven't optimized our Paxos implementation with Multi-paxos is > that Multi-paxos requires leader election and hence, a period of > unavailability when the leader dies. > EPaxos is a Paxos variant that requires (1) less messages than multi-paxos, > (2) is particularly useful across multiple datacenters, and (3) allows any > node to act as coordinator: > http://sigops.org/sosp/sosp13/papers/p358-moraru.pdf > However, there is substantial additional complexity involved if we choose to > implement it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)