[
https://issues.apache.org/jira/browse/CASSANDRA-11038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sam Tunnicliffe updated CASSANDRA-11038:
----------------------------------------
Fix Version/s: 3.x
3.0.x
2.2.x
Status: Patch Available (was: Open)
Pushed branches with fixes for 2.2/3.0/3.7/trunk - though the fix merges
forward cleanly except for conflicts where I've cleaned up imports. Basically,
these preserve the existing behaviour of delivering both {{NEW_NODE}} and
{{UP}} events when a node first joins the cluster & of delaying both until
after the node becomes available for clients. The erroneous {{NEW_NODE}} when a
known node is restarted has been removed. The tracking of pushed notifications
in {{EventNotifier}} is still necessary at the moment (because
[reasons|https://issues.apache.org/jira/browse/CASSANDRA-7816?focusedCommentId=14346387&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14346387]),
but they will go away with CASSANDRA-9156. See CASSANDRA-11731 for some
related discussion.
dtest branch [here|https://github.com/beobal/cassandra-dtest/tree/11038]
||branch||testall||dtest||
|[11038-2.2|https://github.com/beobal/cassandra/tree/11038-2.2]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-2.2-dtest]|
|[11038-3.0|https://github.com/beobal/cassandra/tree/11038-3.0]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.0-dtest]|
|[11038-3.7|https://github.com/beobal/cassandra/tree/11038-3.7]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-3.7-dtest]|
|[11038-trunk|https://github.com/beobal/cassandra/tree/11038-trunk]|[testall|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-testall]|[dtest|http://cassci.datastax.com/view/Dev/view/beobal/job/beobal-11038-trunk-dtest]|
(so far I've only kicked off CI for the 2.2 branch, just in case there's some
problem I didn't run into locally, will kick off the other jobs when that
finishes).
> Is node being restarted treated as node joining?
> ------------------------------------------------
>
> Key: CASSANDRA-11038
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11038
> Project: Cassandra
> Issue Type: Bug
> Components: Distributed Metadata
> Reporter: cheng ren
> Assignee: Sam Tunnicliffe
> Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Hi,
> What we found recently is that every time we restart a node, all other nodes
> in the cluster treat the restarted node as a new node joining and issue node
> joining notification to clients. We have traced the code path being hit when
> a peer node detected a restarted node:
> src/java/org/apache/cassandra/gms/Gossiper.java
> {code}
> private void handleMajorStateChange(InetAddress ep, EndpointState epState)
> {
> if (!isDeadState(epState))
> {
> if (endpointStateMap.get(ep) != null)
> logger.info("Node {} has restarted, now UP", ep);
> else
> logger.info("Node {} is now part of the cluster", ep);
> }
> if (logger.isTraceEnabled())
> logger.trace("Adding endpoint state for " + ep);
> endpointStateMap.put(ep, epState);
> // the node restarted: it is up to the subscriber to take whatever
> action is necessary
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onRestart(ep, epState);
> if (!isDeadState(epState))
> markAlive(ep, epState);
> else
> {
> logger.debug("Not marking " + ep + " alive due to dead state");
> markDead(ep, epState);
> }
> for (IEndpointStateChangeSubscriber subscriber : subscribers)
> subscriber.onJoin(ep, epState);
> }
> {code}
> subscriber.onJoin(ep, epState) ends up with calling onJoinCluster in
> Server.java
> {code}
> src/java/org/apache/cassandra/transport/Server.java
> public void onJoinCluster(InetAddress endpoint)
> {
> server.connectionTracker.send(Event.TopologyChange.newNode(getRpcAddress(endpoint),
> server.socket.getPort()));
> }
> {code}
> We have a full trace of code path and skip some intermedia function calls
> here for being brief.
> Upon receiving the node joining notification, clients would go and scan
> system peer table to fetch the latest topology information. Since we have
> tens of thousands of client connections, scans from all of them put an
> enormous load to our cluster.
> Although in the newer version of driver, client skips fetching peer table if
> the new node has already existed in local metadata, we are still curious why
> node being restarted is handled as node joining on server side? Did we hit a
> bug or this is the way supposed to be? Our old java driver version is 1.0.4
> and cassandra version is 2.0.12.
> Thanks!
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)