[ https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joshua McKenzie updated CASSANDRA-10052: ---------------------------------------- Fix Version/s: (was: 3.0.0 rc2) 2.1.x > Bringing one node down, makes the whole cluster go down for a second > -------------------------------------------------------------------- > > Key: CASSANDRA-10052 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10052 > Project: Cassandra > Issue Type: Bug > Reporter: Sharvanath Pathak > Assignee: Stefania > Labels: client-impacting > Fix For: 2.1.x > > > When a node goes down, the other nodes learn that through the gossip. > And I do see the log from (Gossiper.java): > {code} > private void markDead(InetAddress addr, EndpointState localState) > { > if (logger.isTraceEnabled()) > logger.trace("marking as down {}", addr); > localState.markDead(); > liveEndpoints.remove(addr); > unreachableEndpoints.put(addr, System.nanoTime()); > logger.info("InetAddress {} is now DOWN", addr); > for (IEndpointStateChangeSubscriber subscriber : subscribers) > subscriber.onDead(addr, localState); > if (logger.isTraceEnabled()) > logger.trace("Notified " + subscribers); > } > {code} > Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system > log. > Now on all the other nodes the client side (java driver) says, " Cannot > connect to any host, scheduling retry in 1000 milliseconds". They eventually > do reconnect but some queries fail during this intermediate period. > To me it seems like when the server pushes the nodeDown event, it call the > getRpcAddress(endpoint), and thus sends localhost as the argument in the > nodeDown event. > As in org.apache.cassandra.transport.Server.java > {code} > public void onDown(InetAddress endpoint) > { > > server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint), > server.socket.getPort())); > } > {code} > the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is > using localhost as the configuration for rpc_address (which by the way is the > default). -- This message was sent by Atlassian JIRA (v6.3.4#6332)