[
https://issues.apache.org/jira/browse/CASSANDRA-7318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Williams updated CASSANDRA-7318:
----------------------------------------
Attachment: 7318.txt
The problem here is that there is a dead state for our ip in gossip from the
decommission. Normally, this isn't a problem since our generation would be
newer and knock that state out, except during bootstrap we do a shadow round to
check for an existing endpoint, then fail to clean unreachable endpoints which
is what truncate is checking. I suspect there would be a similar problem with
replace_address on the same ip.
Patch to also clear unreachableEndpoints and liveEndpoints so that the gossiper
is more pristine when it really start.s
> Unable to truncate column family on node which has been decommissioned and
> re-bootstrapped
> ------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-7318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7318
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Environment: Seen running cassandra 2.0.7 running on Red Hat Linux
> Reporter: Thomas Whiteway
> Assignee: Brandon Williams
> Priority: Minor
> Fix For: 2.0.9
>
> Attachments: 7318.txt
>
>
> After decommissioning a node, then re-bootstrapping it, it's not possible to
> truncate column families until cassandra is restarted.
> Steps to reproduce:
> - Start with a two node deployment (nodes A and B)
> - Run nodetool decommission on node B
> - Stop cassandra on node B
> - Delete the contents of the cassandra data and commitlog directories
> - Start cassandra on node B with node A as the seed
> - Run cqlsh on node B and try to truncate a column family
> - cqlsh displays: "Unable to complete request: one or more nodes were
> unavailable."
> According to the logs node B seems to think that itself is down. The follow
> logs appear when the server is started and there are no further logs to
> indicate the B is now UP (A=10.225.45.150, B=10.225.45.151):
> INFO [main] 2014-05-29 10:40:11,090 MessagingService.java (line 461)
> Starting Messaging Service on port 7000
> INFO [HANDSHAKE-/10.225.45.150] 2014-05-29 10:40:11,106
> OutboundTcpConnection.java (line 386) Handshaking version with /10.225.45.150
> INFO [GossipStage:1] 2014-05-29 10:40:11,182 Gossiper.java (line 903) Node
> /10.225.45.150 is now part of the cluster
> INFO [GossipStage:1] 2014-05-29 10:40:11,185 Gossiper.java (line 883)
> InetAddress /10.225.45.151 is now DOWN
> INFO [RequestResponseStage:1] 2014-05-29 10:40:11,215 Gossiper.java (line
> 869) InetAddress /10.225.45.150 is now UP
> This problem isn't hit if cassandra is restarted on node A while node B is
> stopped. The problem goes away if node B is restarted.
--
This message was sent by Atlassian JIRA
(v6.2#6252)