[ 
https://issues.apache.org/jira/browse/CASSANDRA-10243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15015353#comment-15015353
 ] 

Stefania commented on CASSANDRA-10243:
--------------------------------------

Thanks for the review! I plan on working early next week on your comments, see 
answers below, as well as any other comments you may have.

bq. Is it necessary to check if a node is in dead state for the purpose of this 
snitch check? In my understanding, if a node is on a dead state, it's neither 
live nor member of the ring, so I didn't get why that check was done previously 
on getLiveTokenOwners() in the first place, do you know? Maybe historical 
reasons? I'd prefer to have a simpler isLiveMember() check on StorageService 
(since it checks both gossip and tokenmetadata), and this method would 
basically return Gossiper.isLiveEndpoint(endpoint) && 
tokenMetadata.isMember(ep), but this is a personal thing so it's up to you to 
take this suggestion.

>From a quick code analysis I think leaving nodes are still members but their 
>state is dead? In any case, my preference would be to leave existing code 
>unchanged, especially if this goes to 2.1, but I am not opposed to simplifying 
>the new liveliness check for the snitch to what you suggested, 
>{{Gossiper.isLiveEndpoint(endpoint) && tokenMetadata.isMember(ep)}}, since 
>this would mean leaving nodes are also live, which is safer I believe.

bq. Did you intend to decrease the default snitch configuration refresh period 
from 60 to 5 seconds?

Yes I did. I reduced it so that the dtests could complete in a reasonable 
amount of time. I don't see why wait for up to 60 seconds before reloading a 
config file, 5 seconds is a pretty long time and it should not have any adverse 
impact.

bq. On GossipingPropertyFileSnitch I think it's only necessary to check if the 
dc/rack changed, or do you see a situation where one would want to live change 
the rack/dc of a non-ring memmber?

Maybe start the node with {{-Djoin_ring=false}}, change the rack and then join 
the ring? If the node is not live I'd say let them change the rack/dc even 
though I agree it doesn't make much sense. I'm really undecided to be honest, 
basically the only reason to reload the GPFS config should be to change 
{{preferLocal}} so maybe we should never allow chaning dc/rack for GPFS, or 
remove the config reload altogether as suggested in CASSANDRA-9474.
 
bq. Also on the GossipingPropertyFileSnitch maybe it's not necessary to 
updateTopology/invalidateCachedRing, since topology change is not allowed 
anymore?

{{updateTopology}} definitely no longer makes sense but 
{{invalidateCachedRing}} is probably safer to keep it, at least on startup. 
However see my next question, in which case we would need to keep 
{{updateTopology}}.

Should we add a JVM property to override the liveliness checks, just as a 
safety measure in case someone has a legitimate reason to change rack/dc of a 
live node?

> Warn or fail when changing cluster topology live
> ------------------------------------------------
>
>                 Key: CASSANDRA-10243
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10243
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Tools
>            Reporter: Jonathan Ellis
>            Assignee: Stefania
>            Priority: Critical
>             Fix For: 2.1.x
>
>
> Moving a node from one rack to another in the snitch, while it is alive, is 
> almost always the wrong thing to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to