[
https://issues.apache.org/jira/browse/CASSANDRA-13851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16309845#comment-16309845
]
Sam Tunnicliffe commented on CASSANDRA-13851:
---------------------------------------------
I'm +1 on this latest version, though it occurs to me that there is something
else we could do to help full cluster bounces that are done in one shot
(per-replica set or otherwise partial bounces will now proceed ok).
Failure to receive an ack within RING_DELAY will terminate the shadow round,
fatally for a node not in it's own seed list. So if we make non-seeds remain in
the SR for longer than seeds, (e.g. for RING_DELAY * 2), then as long as a
single seed is contactable, startup should be able to proceed.
e.g. all peers have nodes 1, 2 & 3 configured as seeds, but 2 & 3 have failed.
If the cluster is completely stopped and restarted, node1 will exit its SR
after RING_DELAY and be available to ack the other nodes' syn requests. Once
other, non-seeds start to come up, they will also now ack shadow round syns.
This would increase startup times for a full bounce when some seeds are
failing/missing, but in "normal" circumstances it would have no impact.
It wouldn't help if all of the seeds 1, 2 & 3 were down during a full bounce,
but I'd consider that tradeoff acceptable.
> Allow existing nodes to use all peers in shadow round
> -----------------------------------------------------
>
> Key: CASSANDRA-13851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13851
> Project: Cassandra
> Issue Type: Bug
> Components: Lifecycle
> Reporter: Kurt Greaves
> Assignee: Kurt Greaves
> Fix For: 3.11.x, 4.x
>
>
> In CASSANDRA-10134 we made collision checks necessary on every startup. A
> side-effect was introduced that then requires a nodes seeds to be contacted
> on every startup. Prior to this change an existing node could start up
> regardless whether it could contact a seed node or not (because
> checkForEndpointCollision() was only called for bootstrapping nodes).
> Now if a nodes seeds are removed/deleted/fail it will no longer be able to
> start up until live seeds are configured (or itself is made a seed), even
> though it already knows about the rest of the ring. This is inconvenient for
> operators and has the potential to cause some nasty surprises and increase
> downtime.
> One solution would be to use all a nodes existing peers as seeds in the
> shadow round. Not a Gossip guru though so not sure of implications.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]