Re: failure node rejoin

2016-11-24 Thread Yuji Ito
og a JIRA for the “failure node rejoin” issue ( > https://issues.apache.org/*jira*/browse/ > <https://issues.apache.org/jira/browse/>*cassandra > <https://issues.apache.org/jira/browse/cassandra>). I*t sounds like > unexpected behaviour to me. However, I’m not sure it will be view

Re: failure node rejoin

2016-11-23 Thread Ben Slater
You could certainly log a JIRA for the “failure node rejoin” issue ( https://issues.apache.org/*jira*/browse/ <https://issues.apache.org/jira/browse/>*cassandra <https://issues.apache.org/jira/browse/cassandra>). I*t sounds like unexpected behaviour to me. However, I’m not sure it wi

Re: failure node rejoin

2016-11-23 Thread Yuji Ito
Hi Ben, I continue to investigate the data loss issue. I'm investigating logs and source code and try to reproduce the data loss issue with a simple test. I also try my destructive test with DROP instead of TRUNCATE. BTW, I want to discuss the issue of the title "failure node rejoin&q

Re: failure node rejoin

2016-11-10 Thread Ben Slater
>From a quick look I couldn’t find any defects other than the ones you’ve found that seem potentially relevant to your issue (if any one else on the list knows of one please chime in). Maybe the next step, if you haven’t done so already, is to check your Cassandra logs for any signs of issues (ie

Re: failure node rejoin

2016-11-10 Thread Yuji Ito
Thanks Ben, I tried 2.2.8 and could reproduce the problem. So, I'm investigating some bug fixes of repair and commitlog between 2.2.8 and 3.0.9. - CASSANDRA-12508: "nodetool repair returns status code 0 for some errors" - CASSANDRA-12436: "Under some races commit log may incorrectly think it

Re: failure node rejoin

2016-11-08 Thread Ben Slater
There have been a few commit log bugs around in the last couple of months so perhaps you’ve hit something that was fixed recently. Would be interesting to know the problem is still occurring in 2.2.8. I suspect what is happening is that when you do your initial read (without flush) to check the

Re: failure node rejoin

2016-11-08 Thread Yuji Ito
I tried C* 3.0.9 instead of 2.2. The data lost problem hasn't happen for now (without `nodetool flush`). Thanks On Fri, Nov 4, 2016 at 3:50 PM, Yuji Ito wrote: > Thanks Ben, > > When I added `nodetool flush` on all nodes after step 2, the problem > didn't happen. > Did

Re: failure node rejoin

2016-11-04 Thread Yuji Ito
Thanks Ben, When I added `nodetool flush` on all nodes after step 2, the problem didn't happen. Did replay from old commit logs delete rows? Perhaps, the flush operation just detected that some nodes were down in step 2 (just after truncating tables). (Insertion and check in step2 would succeed

Re: failure node rejoin

2016-10-23 Thread Ben Slater
Definitely sounds to me like something is not working as expected but I don’t really have any idea what would cause that (other than the fairly extreme failure scenario). A couple of things I can think of to try to narrow it down: 1) Run nodetool flush on all nodes after step 2 - that will make

Re: failure node rejoin

2016-10-21 Thread Ben Slater
Just to confirm, are you saying: a) after operation 2, you select all and get 1000 rows b) after operation 3 (which only does updates and read) you select and only get 953 rows? If so, that would be very unexpected. If you run your tests without killing nodes do you get the expected (1,000) rows?

Re: failure node rejoin

2016-10-21 Thread Yuji Ito
> Are you certain your tests don’t generate any overlapping inserts (by PK)? Yes. The operation 2) also checks the number of rows just after all insertions. On Fri, Oct 21, 2016 at 2:51 PM, Ben Slater wrote: > OK. Are you certain your tests don’t generate any

Re: failure node rejoin

2016-10-20 Thread Ben Slater
OK. Are you certain your tests don’t generate any overlapping inserts (by PK)? Cassandra basically treats any inserts with the same primary key as updates (so 1000 insert operations may not necessarily result in 1000 rows in the DB). On Fri, 21 Oct 2016 at 16:30 Yuji Ito

Re: failure node rejoin

2016-10-20 Thread Yuji Ito
thanks Ben, > 1) At what stage did you have (or expect to have) 1000 rows (and have the mismatch between actual and expected) - at that end of operation (2) or after operation (3)? after operation 3), at operation 4) which reads all rows by cqlsh with CL.SERIAL > 2) What replication factor and

Re: failure node rejoin

2016-10-20 Thread Ben Slater
A couple of questions: 1) At what stage did you have (or expect to have) 1000 rows (and have the mismatch between actual and expected) - at that end of operation (2) or after operation (3)? 2) What replication factor and replication strategy is used by the test keyspace? What consistency level is

Re: failure node rejoin

2016-10-20 Thread Yuji Ito
Thanks Ben, I tried to run a rebuild and repair after the failure node rejoined the cluster as a "new" node with -Dcassandra.replace_address_first_boot. The failure node could rejoined and I could read all rows successfully. (Sometimes a repair failed because the node cannot access other node. If

Re: failure node rejoin

2016-10-17 Thread Ben Slater
OK, that’s a bit more unexpected (to me at least) but I think the solution of running a rebuild or repair still applies. On Tue, 18 Oct 2016 at 15:45 Yuji Ito wrote: > Thanks Ben, Jeff > > Sorry that my explanation confused you. > > Only node1 is the seed node. > Node2

Re: failure node rejoin

2016-10-17 Thread Yuji Ito
Thanks Ben, Jeff Sorry that my explanation confused you. Only node1 is the seed node. Node2 whose C* data is deleted is NOT a seed. I restarted the failure node(node2) after restarting the seed node(node1). The restarting node2 succeeded without the exception. (I couldn't restart node2 before

Re: failure node rejoin

2016-10-17 Thread Jeff Jirsa
The unstated "problem" here is that node1 is a seed, which implies auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly setup to start without bootstrapping). That means once the data dir is wiped, it's going to start again without a bootstrap, and make a single node

Re: failure node rejoin

2016-10-17 Thread Ben Slater
OK, sorry - I think understand what you are asking now. However, I’m still a little confused by your description. I think your scenario is: 1) Stop C* on all nodes in a cluster (Nodes A,B,C) 2) Delete all data from Node A 3) Restart Node A 4) Restart Node B,C Is this correct? If so, this isn’t

Re: failure node rejoin

2016-10-17 Thread Yabin Meng
The exception you run into is expected behavior. This is because as Ben pointed out, when you delete everything (including system schemas), C* cluster thinks you're bootstrapping a new node. However, node2's IP is still in gossip and this is why you see the exception. I'm not clear the reasoning

Re: failure node rejoin

2016-10-16 Thread Ben Slater
To cassandra, the node where you deleted the files looks like a brand new machine. It doesn’t automatically rebuild machines to prevent accidental replacement. You need to tell it to build the “new” machines as a replacement for the “old” machine with that IP by setting

failure node rejoin

2016-10-16 Thread Yuji Ito
Hi all, A failure node can rejoin a cluster. On the node, all data in /var/lib/cassandra were deleted. Is it normal? I can reproduce it as below. cluster: - C* 2.2.7 - a cluster has node1, 2, 3 - node1 is a seed - replication_factor: 3 how to: 1) stop C* process and delete all data in