[ https://issues.apache.org/jira/browse/CASSANDRA-5836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16385559#comment-16385559 ]
Kurt Greaves edited comment on CASSANDRA-5836 at 3/5/18 4:42 AM: ----------------------------------------------------------------- Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of maintenance on seed nodes/adding new nodes as seeds. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive (especially on large clusters). {quote}Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. {quote} Yep, just an optimisation but is important. For the most part however it shouldn't have any effect on the bootstrap case. {quote}I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. {quote} Yeah, the correct solution is external seed provider or not breaking your config management, but we can still do better here. Especially in the replaces case, and probably the new DC case. {quote}Another case when you add seed nodes is when adding a new DC. In this case they are not the first ones to start so they could bootstrap, but most of the time this is not what you want, so you set auto_bootstrap=false for every node in the new DC, including the new seeds. {quote} It's worth noting here that there is the case of {{SimpleStrategy}} in which you wouldn't want auto_bootstrap=false (this affects auth, traces, system_distributed). This is specifically why you would want every node to bootstrap in a new DC (including seeds). The alternative is to get rid of {{SimpleStrategy}} (or at least stop using it as a default). {quote}In the case where seeds nodes can not be contacted, how do you determine if this is the first node in a cluster (so we should special case and skip bootstrap) vs a mis-configuration or other seeds are down issues and therefor the bootstrap should fail? {quote} If the listed seed isn't itself then you fail. This is how it currently works as well. That is, the first node in the cluster has itself as a seed and also can't contact any other seeds in its seed list. I'm pretty sure my patch above works this way as if there are seeds they should be present in the {{endpointShadowStateMap}} after the SR. There may be some edge cases to think of here though like starting multiple seeds at the same time. Also related is CASSANDRA-14073, which will fix the case where you replace a seed node and it doesn't bootstrap. This one is more important IMO as it's more likely for config management not to handle this case. was (Author: kurtg): Glad some more discussion is happening again here. This has always been a pain point for operators. I doubt many people can actually list every single different startup case in Cassandra, there are a hell of a lot. On the seed issue; in the past I've gone so far as to write a patch for making only the first seed a special case, such as already mentioned by [~oshulgin] to at least fix the issue of maintenance on seed nodes. There's a patch [here|https://github.com/apache/cassandra/compare/trunk...kgreav:13851-extension-3.11] if anyone cares, think it's more or less working and only missing tests at the moment (but I'm probably wrong). Note it's built on my patch for CASSANDRA-13851, so not all code there is relevant. {quote}My understanding is that the 'seed node' role has a significant initial-topology-discovery responsibility which I have not seen mentioned in recent discussions. {quote} Not my understanding. A seed simply defines a node to connect to to join the cluster. No topology information is transmitted between the seed and the node contacting it. If it did, this would bring it's own complexities and likely make Gossip really expensive (especially on large clusters). {quote}Also, as of my last knowledge of this code, a given node will gossip with a Seed node more frequently than its other peers, which I believe is "just an optimization of gossip" but seems notable. {quote} Yep, just an optimisation but is important. For the most part however it shouldn't have any effect on the bootstrap case. {quote}I also recall past dev discussion (with driftx?) suggesting that the "correct" solution in their view is an external seed provider. {quote} Yeah, the correct solution is external seed provider or not breaking your config management, but we can still do better here. Especially in the replaces case, and probably the new DC case. {quote}Another case when you add seed nodes is when adding a new DC. In this case they are not the first ones to start so they could bootstrap, but most of the time this is not what you want, so you set auto_bootstrap=false for every node in the new DC, including the new seeds. {quote} It's worth noting here that there is the case of {{SimpleStrategy}} in which you wouldn't want auto_bootstrap=false (this affects auth, traces, system_distributed). This is specifically why you would want every node to bootstrap in a new DC (including seeds). The alternative is to get rid of {{SimpleStrategy}} (or at least stop using it as a default). {quote}In the case where seeds nodes can not be contacted, how do you determine if this is the first node in a cluster (so we should special case and skip bootstrap) vs a mis-configuration or other seeds are down issues and therefor the bootstrap should fail? {quote} If the listed seed isn't itself then you fail. This is how it currently works as well. That is, the first node in the cluster has itself as a seed and also can't contact any other seeds in its seed list. I'm pretty sure my patch above works this way as if there are seeds they should be present in the {{endpointShadowStateMap}} after the SR. There may be some edge cases to think of here though like starting multiple seeds at the same time. Also related is CASSANDRA-14073, which will fix the case where you replace a seed node and it doesn't bootstrap. This one is more important IMO as it's more likely for config management not to handle this case. > Seed nodes should be able to bootstrap without manual intervention > ------------------------------------------------------------------ > > Key: CASSANDRA-5836 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5836 > Project: Cassandra > Issue Type: Bug > Reporter: Bill Hathaway > Priority: Minor > > The current logic doesn't allow a seed node to be bootstrapped. If a user > wants to bootstrap a node configured as a seed (for example to replace a seed > node via replace_token), they first need to remove the node's own IP from the > seed list, and then start the bootstrap process. This seems like an > unnecessary step since a node never uses itself as a seed. > I think it would be a better experience if the logic was changed to allow a > seed node to bootstrap without manual intervention when there are other seed > nodes up in a ring. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org