[jira] [Updated] (CASSANDRA-7318) Unable to truncate column family on node which has been decommissioned and re-bootstrapped

Brandon Williams (JIRA) Tue, 17 Jun 2014 15:39:29 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Brandon Williams updated CASSANDRA-7318:
----------------------------------------

    Attachment: 7318.txt

The problem here is that there is a dead state for our ip in gossip from the 
decommission.  Normally, this isn't a problem since our generation would be 
newer and knock that state out, except during bootstrap we do a shadow round to 
check for an existing endpoint, then fail to clean unreachable endpoints which 
is what truncate is checking.  I suspect there would be a similar problem with 
replace_address on the same ip.

Patch to also clear unreachableEndpoints and liveEndpoints so that the gossiper 
is more pristine when it really start.s

> Unable to truncate column family on node which has been decommissioned and 
> re-bootstrapped
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7318
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>         Environment: Seen running cassandra 2.0.7 running on Red Hat Linux
>            Reporter: Thomas Whiteway
>            Assignee: Brandon Williams
>            Priority: Minor
>             Fix For: 2.0.9
>
>         Attachments: 7318.txt
>
>
> After decommissioning a node, then re-bootstrapping it, it's not possible to 
> truncate column families until cassandra is restarted.
> Steps to reproduce:
> - Start with a two node deployment (nodes A and B)
> - Run nodetool decommission on node B
> - Stop cassandra on node B
> - Delete the contents of the cassandra data and commitlog directories
> - Start cassandra on node B with node A as the seed
> - Run cqlsh on node B and try to truncate a column family
> - cqlsh displays: "Unable to complete request: one or more nodes were 
> unavailable."
> According to the logs node B seems to think that itself is down.  The follow 
> logs appear when the server is started and there are no further logs to 
> indicate the B is now UP (A=10.225.45.150, B=10.225.45.151):
>  INFO [main] 2014-05-29 10:40:11,090 MessagingService.java (line 461) 
> Starting Messaging Service on port 7000
>  INFO [HANDSHAKE-/10.225.45.150] 2014-05-29 10:40:11,106 
> OutboundTcpConnection.java (line 386) Handshaking version with /10.225.45.150
>  INFO [GossipStage:1] 2014-05-29 10:40:11,182 Gossiper.java (line 903) Node 
> /10.225.45.150 is now part of the cluster
>  INFO [GossipStage:1] 2014-05-29 10:40:11,185 Gossiper.java (line 883) 
> InetAddress /10.225.45.151 is now DOWN
>  INFO [RequestResponseStage:1] 2014-05-29 10:40:11,215 Gossiper.java (line 
> 869) InetAddress /10.225.45.150 is now UP
> This problem isn't hit if cassandra is restarted on node A while node B is 
> stopped.  The problem goes away if node B is restarted.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7318) Unable to truncate column family on node which has been decommissioned and re-bootstrapped

Reply via email to