[ 
https://issues.apache.org/jira/browse/CASSANDRA-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-10052:
---------------------------------------
    Assignee: Stefania

I see.  Sounds like we should just special-case it and not send anything from 
onDown if a peer listening on localhost goes down.

> Bringing one node down, makes the whole cluster go down for a second
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-10052
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10052
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sharvanath Pathak
>            Assignee: Stefania
>            Priority: Critical
>
> When a node goes down, the other nodes learn that through the gossip.
> And I do see the log from (Gossiper.java):
> {code}
> private void markDead(InetAddress addr, EndpointState localState)
>    {
>        if (logger.isTraceEnabled())
>            logger.trace("marking as down {}", addr);
>        localState.markDead();
>        liveEndpoints.remove(addr);
>        unreachableEndpoints.put(addr, System.nanoTime());
>        logger.info("InetAddress {} is now DOWN", addr);
>        for (IEndpointStateChangeSubscriber subscriber : subscribers)
>            subscriber.onDead(addr, localState);
>        if (logger.isTraceEnabled())
>            logger.trace("Notified " + subscribers);
>    }
> {code}
> Saying: "InetAddress 192.168.101.1 is now Down", in the Cassandra's system 
> log.
> Now on all the other nodes the client side (java driver) says, " Cannot 
> connect to any host, scheduling retry in 1000 milliseconds". They eventually 
> do reconnect but some queries fail during this intermediate period.
> To me it seems like when the server pushes the nodeDown event, it call the 
> getRpcAddress(endpoint), and thus sends localhost as the argument in the 
> nodeDown event.  
> As in org.apache.cassandra.transport.Server.java
> {code}
>   public void onDown(InetAddress endpoint)
>        {      
>            
> server.connectionTracker.send(Event.StatusChange.nodeDown(getRpcAddress(endpoint),
>  server.socket.getPort()));
>        }
> {code}
> the getRpcAddress returns localhost for any endpoint if the cassandra.yaml is 
> using localhost as the configuration for rpc_address (which by the way is the 
> default).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to