[
https://issues.apache.org/jira/browse/CASSANDRA-18096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643884#comment-17643884
]
Brandon Williams commented on CASSANDRA-18096:
----------------------------------------------
bq. So it is in NORMAL but it is not alive yet which is quite strange.
Normal/joining/leaving/etc refers to membership status, not up/down.
> Do not spam the logs with MigrationCoordinator not able to pull schemas on
> bootstrap
> ------------------------------------------------------------------------------------
>
> Key: CASSANDRA-18096
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18096
> Project: Cassandra
> Issue Type: Improvement
> Components: Cluster/Schema
> Reporter: Stefan Miklosovic
> Assignee: Stefan Miklosovic
> Priority: Low
> Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> When a node is joining a cluster, there is this output upon startup:
> {code}
> cassandra_node_6 | INFO [GossipStage:1] 2022-12-06 12:48:07,187
> Gossiper.java:1413 - Node /172.19.0.5:7000 is now part of the cluster
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6 | WARN MigrationCoordinator.java:650 - Can't send schema
> pull request: node /172.19.0.5:7000 is down.
> {code}
> This is there for a lot of already existing nodes. You got the idea. This log
> is misleading, it indeed can not pull requests because "node is down" but it
> is not down, it just thinks it is because Gossiper has not had a chance to
> receive any gossip about these nodes _yet_.
> I put there more logs and it writes this:
> {code}
> MigrationCoordinator.java:655 - Can't send schema pull request: node
> /172.19.0.5:7000 is down: NORMAL, isAlive: false
> {code}
> When I do this:
> {code}
> if (!gossiper.hasEndpointState(endpoint))
> return;
> if (!gossiper.isAlive(endpoint))
> {
> EndpointState endpointStateForEndpoint =
> gossiper.getEndpointStateForEndpoint(endpoint);
> String status =
> Gossiper.getGossipStatus(endpointStateForEndpoint);
> logger.warn("Can't send schema pull request: node {} is down: {},
> isAlive: {}", endpoint, status, endpointStateForEndpoint.isAlive());
> callback.onFailure(endpoint, RequestFailureReason.UNKNOWN);
> return;
> }
> {code}
> So it is in NORMAL but it is not alive yet which is quite strange.
> The fix is to still return prematurely but we would not skip the logging on
> WARN only in case isAlive is false and status is _not_NORMAL. We would
> however still log on TRACE at least.
> (1)
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L648-L653
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]