[ https://issues.apache.org/jira/browse/CASSANDRA-7542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063405#comment-14063405 ]
Benedict commented on CASSANDRA-7542: ------------------------------------- It seems to me the first and simplest improvement is to bound the _cost_ of contention by tracking the, say, ~90th %ile of paxos round time, and to sleep for anywhere between this time, and twice this time, once we detect contention. Secondly, we could potentially have a lock within each C* process, where for some period of time only one paxos request may be in flight for the given partition+host (as it looks to me that a read against the same host as a write can cause the write's paxos round to be interrupted, so the host fights with itself... though I need to digest the code a little more to be sure about this one). > Reduce CAS contention > --------------------- > > Key: CASSANDRA-7542 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7542 > Project: Cassandra > Issue Type: Improvement > Reporter: sankalp kohli > Assignee: Benedict > Fix For: 2.0.10 > > > CAS updates on same CQL partition can lead to heavy contention inside C*. I > am looking for simple ways(no algorithmic changes) to reduce contention as > the penalty of it is high in terms of latency, specially for reads. > We can put some sort of synchronization on CQL partition at StorageProxy > level. This will reduce contention at least for all requests landing on one > box for same partition. > Here is an example of why it will help: > 1) Say 1 write and 2 read CAS requests for the same partition key is send to > C* in parallel. > 2) Since client is token-aware, it sends these 3 request to the same C* > instance A. (Lets assume that all 3 requests goto same instance A) > 3) In this C* instance A, all 3 CAS requests will contend with each other in > Paxos. (This is bad) > To improve contention in 3), what I am proposing is to add a lock on > partition key similar to what we do in PaxosState.java to serialize these 3 > requests. This will remove the contention and improve performance as these 3 > requests will not collide with each other. > Another improvement we can do in client is to pick a deterministic live > replica for a given partition doing CAS. -- This message was sent by Atlassian JIRA (v6.2#6252)