Stefan Miklosovic created CASSANDRA-18096:
---------------------------------------------
Summary: Do not spam the logs with MigrationCoordinator not able
to pull schemas on boostrap
Key: CASSANDRA-18096
URL: https://issues.apache.org/jira/browse/CASSANDRA-18096
Project: Cassandra
Issue Type: Improvement
Components: Cluster/Schema
Reporter: Stefan Miklosovic
Assignee: Stefan Miklosovic
When a node is joining a cluster, there is this output upon startup:
{code}
cassandra_node_6 | INFO [GossipStage:1] 2022-12-06 12:48:07,187
Gossiper.java:1413 - Node /172.19.0.5:7000 is now part of the cluster
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,212
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,213
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | INFO [GossipStage:1] 2022-12-06 12:48:07,213
TokenMetadata.java:539 - Updating topology for /172.19.0.5:7000
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,213
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | INFO [GossipStage:1] 2022-12-06 12:48:07,213
TokenMetadata.java:539 - Updating topology for /172.19.0.5:7000
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,214
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,214
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,214
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,214
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
cassandra_node_6 | WARN [MigrationStage:1] 2022-12-06 12:48:07,215
MigrationCoordinator.java:650 - Can't send schema pull request: node
/172.19.0.5:7000 is down.
{code}
This is there for a lot of already existing nodes. You got the idea. This log
is misleading, it indeed can not pull requests because "node is down" but it is
not down, it just thinks it is because Gossiper has not had a chance to receive
any gossip about these nodes _yet_.
I put there more logs and it write this:
{code}
MigrationCoordinator.java:655 - Can't send schema pull request: node
/172.19.0.5:7000 is down: NORMAL, isAlive: false
{code}
When I do this:
{code}
if (!gossiper.hasEndpointState(endpoint))
return;
if (!gossiper.isAlive(endpoint))
{
EndpointState endpointStateForEndpoint =
gossiper.getEndpointStateForEndpoint(endpoint);
String status = Gossiper.getGossipStatus(endpointStateForEndpoint);
logger.warn("Can't send schema pull request: node {} is down: {},
isAlive: {}", endpoint, status, endpointStateForEndpoint.isAlive());
callback.onFailure(endpoint, RequestFailureReason.UNKNOWN);
return;
}
{code}
So it is in NORMAL but it is not alive yet which is quite strange.
The fix is to still return prematurely but we would skip the logging on WARN
only in case isAlive is false and status is _not_NORMAL. We would however still
log on TRACE at least.
(1)
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L648-L653
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]