[
https://issues.apache.org/jira/browse/CASSANDRA-13327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ariel Weisberg resolved CASSANDRA-13327.
----------------------------------------
Resolution: Not A Problem
Closing to open a ticket specific to relaxing the limit on number of pending
endpoints and CAS.
> Pending endpoints size check for CAS doesn't play nicely with
> writes-on-replacement
> -----------------------------------------------------------------------------------
>
> Key: CASSANDRA-13327
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13327
> Project: Cassandra
> Issue Type: Bug
> Components: Coordination
> Reporter: Ariel Weisberg
> Assignee: Ariel Weisberg
>
> Consider this ring:
> 127.0.0.1 MR UP JOINING -7301836195843364181
> 127.0.0.2 MR UP NORMAL -7263405479023135948
> 127.0.0.3 MR UP NORMAL -7205759403792793599
> 127.0.0.4 MR DOWN NORMAL -7148113328562451251
> where 127.0.0.1 was bootstrapping for cluster expansion. Note that, due to
> the failure of 127.0.0.4, 127.0.0.1 was stuck trying to stream from it and
> making no progress.
> Then the down node was replaced so we had:
> 127.0.0.1 MR UP JOINING -7301836195843364181
> 127.0.0.2 MR UP NORMAL -7263405479023135948
> 127.0.0.3 MR UP NORMAL -7205759403792793599
> 127.0.0.5 MR UP JOINING -7148113328562451251
> It’s confusing in the ring - the first JOINING is a genuine bootstrap, the
> second is a replacement. We now had CAS unavailables (but no non-CAS
> unvailables). I think it’s because the pending endpoints check thinks that
> 127.0.0.5 is gaining a range when it’s just replacing.
> The workaround is to kill the stuck JOINING node, but Cassandra shouldn’t
> unnecessarily fail these requests.
> It also appears like required participants is bumped by 1 during a host
> replacement so if the replacing host fails you will get unavailables and
> timeouts.
> This is related to the check added in CASSANDRA-8346
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)