[ 
https://issues.apache.org/jira/browse/CASSANDRA-13851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309845#comment-16309845
 ] 

Sam Tunnicliffe commented on CASSANDRA-13851:
---------------------------------------------

I'm +1 on this latest version, though it occurs to me that there is something 
else we could do to help full cluster bounces that are done in one shot 
(per-replica set or otherwise partial bounces will now proceed ok).

Failure to receive an ack within RING_DELAY will terminate the shadow round, 
fatally for a node not in it's own seed list. So if we make non-seeds remain in 
the SR for longer than seeds, (e.g. for RING_DELAY * 2), then as long as a 
single seed is contactable, startup should be able to proceed.
 
e.g. all peers have nodes 1, 2 & 3 configured as seeds, but 2 & 3 have failed. 
If the cluster is completely stopped and restarted, node1 will exit its SR 
after RING_DELAY and be available to ack the other nodes' syn requests. Once 
other, non-seeds start to come up, they will also now ack shadow round syns. 
This would increase startup times for a full bounce when some seeds are 
failing/missing, but in "normal" circumstances it would have no impact. 
It wouldn't help if all of the seeds 1, 2 & 3 were down during a full bounce, 
but I'd consider that tradeoff acceptable.


> Allow existing nodes to use all peers in shadow round
> -----------------------------------------------------
>
>                 Key: CASSANDRA-13851
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13851
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Lifecycle
>            Reporter: Kurt Greaves
>            Assignee: Kurt Greaves
>             Fix For: 3.11.x, 4.x
>
>
> In CASSANDRA-10134 we made collision checks necessary on every startup. A 
> side-effect was introduced that then requires a nodes seeds to be contacted 
> on every startup. Prior to this change an existing node could start up 
> regardless whether it could contact a seed node or not (because 
> checkForEndpointCollision() was only called for bootstrapping nodes). 
> Now if a nodes seeds are removed/deleted/fail it will no longer be able to 
> start up until live seeds are configured (or itself is made a seed), even 
> though it already knows about the rest of the ring. This is inconvenient for 
> operators and has the potential to cause some nasty surprises and increase 
> downtime.
> One solution would be to use all a nodes existing peers as seeds in the 
> shadow round. Not a Gossip guru though so not sure of implications.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to