[
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015353#comment-15015353
]
Stefania commented on CASSANDRA-10243:
--------------------------------------
Thanks for the review! I plan on working early next week on your comments, see
answers below, as well as any other comments you may have.
bq. Is it necessary to check if a node is in dead state for the purpose of this
snitch check? In my understanding, if a node is on a dead state, it's neither
live nor member of the ring, so I didn't get why that check was done previously
on getLiveTokenOwners() in the first place, do you know? Maybe historical
reasons? I'd prefer to have a simpler isLiveMember() check on StorageService
(since it checks both gossip and tokenmetadata), and this method would
basically return Gossiper.isLiveEndpoint(endpoint) &&
tokenMetadata.isMember(ep), but this is a personal thing so it's up to you to
take this suggestion.
>From a quick code analysis I think leaving nodes are still members but their
>state is dead? In any case, my preference would be to leave existing code
>unchanged, especially if this goes to 2.1, but I am not opposed to simplifying
>the new liveliness check for the snitch to what you suggested,
>{{Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep)}}, since
>this would mean leaving nodes are also live, which is safer I believe.
bq. Did you intend to decrease the default snitch configuration refresh period
from 60 to 5 seconds?
Yes I did. I reduced it so that the dtests could complete in a reasonable
amount of time. I don't see why wait for up to 60 seconds before reloading a
config file, 5 seconds is a pretty long time and it should not have any adverse
impact.
bq. On GossipingPropertyFileSnitch I think it's only necessary to check if the
dc/rack changed, or do you see a situation where one would want to live change
the rack/dc of a non-ring memmber?
Maybe start the node with {{-Djoin_ring=false}}, change the rack and then join
the ring? If the node is not live I'd say let them change the rack/dc even
though I agree it doesn't make much sense. I'm really undecided to be honest,
basically the only reason to reload the GPFS config should be to change
{{preferLocal}} so maybe we should never allow chaning dc/rack for GPFS, or
remove the config reload altogether as suggested in CASSANDRA-9474.
bq. Also on the GossipingPropertyFileSnitch maybe it's not necessary to
updateTopology/invalidateCachedRing, since topology change is not allowed
anymore?
{{updateTopology}} definitely no longer makes sense but
{{invalidateCachedRing}} is probably safer to keep it, at least on startup.
However see my next question, in which case we would need to keep
{{updateTopology}}.
Should we add a JVM property to override the liveliness checks, just as a
safety measure in case someone has a legitimate reason to change rack/dc of a
live node?
> Warn or fail when changing cluster topology live
> ------------------------------------------------
>
> Key: CASSANDRA-10243
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
> Project: Cassandra
> Issue Type: Improvement
> Components: Tools
> Reporter: Jonathan Ellis
> Assignee: Stefania
> Priority: Critical
> Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is
> almost always the wrong thing to do.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)