[jira] [Commented] (CASSANDRA-10423) Paxos/LWT failures when moving node

Sylvain Lebresne (JIRA) Thu, 17 Dec 2015 02:09:08 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15061831#comment-15061831
 ]


Sylvain Lebresne commented on CASSANDRA-10423:
----------------------------------------------

bq. Why will there be more contention when there are more participants?

That's actually not what I'm saying. What I'm saying is that for a fixed amount 
of contention, having more participants make it more likely for Paxos rounds to 
interleave and thus having to restart new rounds and thus for any particular 
update to be slower. Or to put it another way, I'm just saying that conditional 
update performance is generally negatively impacted by bigger RF (all other 
things being equal), and moving happen to temporarily bump the RF as far Paxos 
is concerned.

But again, I'm not pretending that this explain the problem here at all as the 
difference in performance observed by Roger during a move is more dramatic than 
what I would expect this explanation to account for. I just don't have other 
idea of why a move would impact Paxos that badly and so we'd need to be able to 
reproduce this easily to find the problem.

> Paxos/LWT failures when moving node
> -----------------------------------
>
>                 Key: CASSANDRA-10423
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10423
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra version: 2.0.14
> Java-driver version: 2.0.11
>            Reporter: Roger Schildmeijer
>             Fix For: 2.1.x
>
>
> While moving a node (nodetool move <newtoken>) we noticed that lwt started 
> failing for some (~50%) requests. The java-driver (version 2.0.11) returned 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (7 replica were required but only 0 
> acknowledged the write). The cluster was not under heavy load.
> I noticed that the failed lwt requests all took just above 1s. That 
> information and the WriteTimeoutException could indicate that this happens:
> https://github.com/apache/cassandra/blob/cassandra-2.0.14/src/java/org/apache/cassandra/service/StorageProxy.java#L268
> I can't explain why though. Why would there be more cas contention just 
> because a node is moving?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10423) Paxos/LWT failures when moving node

Reply via email to