To cassandra, the node where you deleted the files looks like a brand new machine. It doesn’t automatically rebuild machines to prevent accidental replacement. You need to tell it to build the “new” machines as a replacement for the “old” machine with that IP by setting -Dcassandra.replace_address_first_boot=<dead_node_ip>. See http://cassandra.apache.org/doc/latest/operating/topo_changes.html.
Cheers Ben On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote: > Hi all, > > A failure node can rejoin a cluster. > On the node, all data in /var/lib/cassandra were deleted. > Is it normal? > > I can reproduce it as below. > > cluster: > - C* 2.2.7 > - a cluster has node1, 2, 3 > - node1 is a seed > - replication_factor: 3 > > how to: > 1) stop C* process and delete all data in /var/lib/cassandra on node2 > ($sudo rm -rf /var/lib/cassandra/*) > 2) stop C* process on node1 and node3 > 3) restart C* on node1 > 4) restart C* on node2 > > nodetool status after 4): > Datacenter: datacenter1 > ======================= > Status=Up/Down > |/ State=Normal/Leaving/Joining/Moving > -- Address Load Tokens Owns (effective) Host ID > Rack > DN [node3 IP] ? 256 100.0% > 325553c6-3e05-41f6-a1f7-47436743816f rack1 > UN [node2 IP] 7.76 MB 256 100.0% > 05bdb1d4-c39b-48f1-8248-911d61935925 rack1 > UN [node1 IP] 416.13 MB 256 100.0% > a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b rack1 > > If I restart C* on node 2 when C* on node1 and node3 are running (without > 2), 3)), a runtime exception happens. > RuntimeException: "A node with address [node2 IP] already exists, > cancelling join..." > > I'm not sure this causes data lost. All data can be read properly just > after this rejoin. > But some rows are lost when I kill&restart C* for destructive tests after > this rejoin. > > Thanks. > > -- ———————— Ben Slater Chief Product Officer Instaclustr: Cassandra + Spark - Managed | Consulting | Support +61 437 929 798