[
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171139#comment-17171139
]
Blake Eggleston commented on CASSANDRA-15158:
---------------------------------------------
{quote}
I am not completely sure why are we pulling again here. I would rewrite the
whole solution in a such way that this Callable just does one thing on a
successful response (merging of a schema) and the actual "retry" would be
handled from outside. The reader has to make quite a mental exercise to
visualise that this callback might actually call another callback in it until
some "version" is completed etc ... At least for me, it was quite tedious to
track.
{quote}
In the case of a successful pull, we won't pull again. Response and fail both
call pullComplete, but an additional pull is only called if it's called from
fail.
I get that this can be a bit difficult to follow, but I'm not sure there's a
better approach, given the schema pulls are completely event driven during
normal runtime. If we miss a schema change during normal runtime (not
bootstrap), there's nothing waiting on schema convergence that would enable us
to retry from the outside.
There is a periodic task that pulls schema for outstanding versions that don't
have any in flight requests^[1]^, but it only runs once a minute, and we need
to be more proactive about learning about schema updates since we'll be unable
to serve some reads and writes until we're up to date.
{quote}TBH that is quite counterintuitive too
{quote}
Could you expand on what's counterintuitive about it? If the endpoint's schema
version has changed, we need to disassociate it with it's previously reported
version. I have added a comment saying as much.
{quote}The test has failed for me (repeatedly):
{quote}
Thanks, it should be passing now.
[1] This handles the case where all nodes reporting a given version are on a
different version so we can't pull schema from them, and acts as a hedge
against any bugs in this implementation that might cause us to not schedule
schema pulls as intended
> Wait for schema agreement rather than in flight schema requests when
> bootstrapping
> ----------------------------------------------------------------------------------
>
> Key: CASSANDRA-15158
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip, Cluster/Schema
> Reporter: Vincent White
> Assignee: Blake Eggleston
> Priority: Normal
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of
> in-flight schema pull requests, and we don't proceed with
> bootstrapping/stream until all the latches are released (or we timeout
> waiting for each one). One issue with this is that if we have a large schema,
> or the retrieval of the schema from the other nodes was unexpectedly slow
> then we have no explicit check in place to ensure we have actually received a
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the
> node to wait on each latche longer, there are cases where this doesn't help
> because the callbacks for the schema pull requests have expired off the
> messaging service's callback map
> (org.apache.cassandra.net.MessagingService#callbacks) after
> request_timeout_in_ms (default 10 seconds) before the other nodes were able
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the
> rest of the live nodes before proceeding with bootstrapping. It also adds a
> check to prevent the new node from flooding existing nodes with simultaneous
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters
> getting stuck for extended amounts of time as they wait
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the
> timed out callbacks.
>
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]