[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102374#comment-17102374
 ] 

Stefan Miklosovic edited comment on CASSANDRA-15158 at 5/8/20, 8:50 AM:
------------------------------------------------------------------------

Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}
We should fail whole bootstrapping and one should go and fix it.
{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?
{quote}
How can a node report its schema while being unreachable?
{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.
{quote}
 

Got you, this might be tracked.

 

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are accepting them so we can elaborate on 
their state. Without "injecting" them from outside, I would not be able to do 
that.


was (Author: stefan.miklosovic):
Hi [~bdeggleston],

commenting on design issues, I am not completely sure if these issues you are 
talking about are related to this patch or they are already existing? We could 
indeed focus on the points you raised but it seems to me that the current 
(comitted) code is worse without this patch than with as I guess these problems 
are already there?

Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place? 
{quote}It will only wait until it has _some_ schema to begin bootstrapping, not 
all
{quote}
This is the most likely not true unless I am not getting something. The node to 
be bootstrapped will never advance in doing so unless all nodes have same 
versions. 
{quote} For instance, if a single node is reporting a schema version that no 
one else has, but the node is unreachable, what do we do?
{quote}
We should fail whole bootstrapping and one should go and fix it.
{quote}For instance, if a single node is reporting a schema version that no one 
else has, but the node is unreachable, what do we do?
{quote}
How can a node report its schema while being unreachable?
{quote}Next, I like how this limits the number of messages sent to a given 
endpoint, but we should also limit the number of messages we send out for a 
given schema version. If we have a large cluster, and all nodes are reporting 
the same version, we don't need to ask every node for it's schema.
{quote}
-I am sorry, I am not following what you say here, in particular the very last 
sentence. I think the schema is ever pull (message is sent) _only_ in case that 
reported schema version from Gossipper is different, only after that we are 
ever sending a message.-

I am taking this back, you might be right here, I see what you mean, but this 
make whole solution even more complicated.

When it comes to testing, I admit that adding isRunningForcibly method feels 
like a hack but I had very hard time to test this stuff out. It was basically 
the only reasonable way possible at the time I was coding it, if you know of 
more better version, please tell me otherwise I am not sure what might be 
better here and we could stick with this for a time being? The whole testing 
methodology was based on these callbacks and checking their inner state which 
results into having a methods which are accepting them so we can elaborate on 
their state. Without "injecting" them from outside, I would not be able to do 
that.

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to