[jira] [Commented] (CASSANDRA-10423) Paxos/LWT failures when moving node

Sylvain Lebresne (JIRA) Thu, 01 Oct 2015 05:13:03 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939737#comment-14939737
 ]


Sylvain Lebresne commented on CASSANDRA-10423:
----------------------------------------------

bq. Why would there be more cas contention just because a node is moving?

When a node is "pending" (moving or bootstrapping), it is considered as a 
"paxos participant" (the reason is basically the same than in CASSANDRA-833). 
And more participants in the paxos rounds does mean potentially more risk for 
the algorithm to not make progress due to contention). That could explain more 
timeouts while moving/bootstrapping a node, though I don't know if that 
explains going from no timeout to 50% of requests timeouting (assuming that's 
what you see). We probably need to be able to reproduce this if we want to find 
out the cause ([~enigmacurry] do you think you can find someone to look at it), 
but for that, more information on your case would greatly help (like your 
number of nodes, replication settings, whether your usage of LWT is likely to 
content a lot or not, ...).

> Paxos/LWT failures when moving node
> -----------------------------------
>
>                 Key: CASSANDRA-10423
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10423
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Cassandra version: 2.0.14
> Java-driver version: 2.0.11
>            Reporter: Roger Schildmeijer
>
> While moving a node (nodetool move <newtoken>) we noticed that lwt started 
> failing for some (~50%) requests. The java-driver (version 2.0.11) returned 
> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout 
> during write query at consistency SERIAL (7 replica were required but only 0 
> acknowledged the write). The cluster was not under heavy load.
> I noticed that the failed lwt requests all took just above 1s. That 
> information and the WriteTimeoutException could indicate that this happens:
> https://github.com/apache/cassandra/blob/cassandra-2.0.14/src/java/org/apache/cassandra/service/StorageProxy.java#L268
> I can't explain why though. Why would there be more cas contention just 
> because a node is moving?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10423) Paxos/LWT failures when moving node

Reply via email to