Hello Alex, long time - I had to wait for a quiet week to try this. I finally did, I > thought I'd give you some feedback.
Thanks for taking the time to share this, I guess it might be useful to some other people around to know the end of the story ;-). Glad this worked for you, C*heers, ----------------------- Alain Rodriguez - al...@thelastpickle.com France / Spain The Last Pickle - Apache Cassandra Consulting http://www.thelastpickle.com Le ven. 16 août 2019 à 08:16, Alex <m...@aca-o.com> a écrit : > Hello Alain, > > long time - I had to wait for a quiet week to try this. I finally did, I > thought I'd give you some feedback. > > Short reminder: one of the nodes of my 3.9 cluster died and I replaced it. > But it still appeared in nodetool status, on one node with a "null" host_id > and on another with the same host_id of its replacement. nodetool > assassinate failed and I could not decommission or remove any other node on > the cluster. > > Basically, after backup and preparing another cluster in case anything > went wrong, I did : > > DELETE FROM system.peers WHERE peer = '192.168.1.18'; > > and restarted cassandra on the two nodes still seeing the zombie node. > > After the first restart, the cassandra system.log was filled with: > > java.lang.NullPointerException: null > WARN [MutationStage-2] 2019-08-15 15:31:44,735 > AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread > Thread[MutationStage-2,5,main]: > > So... I restarted again. The error disappeared. I ran a full repair and > everything seems to be back in order. I could decommission a node without > problem. > > Thanks for your help ! > > Alex > > > > > Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit : > > Alex, > > >> Well, I tried : rolling restart did not work its magic. > > > Sorry to hear and for misleading you. May faith into the rolling restart > magical power went down a bit, but I still think it was worth a try :D. > >> @ Alain : In system.peers I see both the dead node and its replacement >> with the same ID : >> peer | host_id >> --------------+-------------------------------------- >> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 >> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 >> >> Is it expected ? >> >> If I cannot fix this, I think I will add new nodes and remove, one by >> one, the nodes that show the dead node in nodetool status. >> > Well, no. This is clearly not good or expected I would say. > > *tl;dr - Suggested fix:* > What I would try to fix this is the following is removing this row. It > *should* be safe but that's only my opinion and with the condition you > remove *only* the 'ghost/dead' nodes. Any mistake here would probably be > costly. Again, be aware you're on a sensitive part when messing with system > tables. Think it twice, check it twice, take a copy of the SSTables/a > snapshot. Then I would go for it and observe changes on one node first. If > no harm is done, continue to the next node. > > Considering the old node is '192.168.1.18', I would run this on all nodes > (maybe after testing on a node) to make it simple or just run it on nodes > that show the ghost node(s): > > *"DELETE FROM system.peers WHERE peer = '192.168.1.18';"* > > Maybe will you need to restart, I think you won't even need it. I have > good hope that this should finally fix your issue with no harm. > > *More context - Idea of the problem:* > This above, is clearly an issue I would say. Most probably the source of > your troubles here. The problem is that I lack understanding. From where I > stand, this kind of bugs should not happen anymore in Cassandra (I did not > see anything similar for a while). > > I would blame: > - A corner case scenario (unlikely, system tables are rather solid for a > while). Or maybe are you using an old C* version. It *might* be related to > this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122) > - A really weird operation (A succession of action might have put you in > this state, but hard for me to say what) > - KairosDB? I don't know It or what it does. Might it be less reliable > than Cassandra is, and have lead to this issue? Maybe, I have no clue once > again. > > *Risk of this operation and current situation:* > Also, I *think* the current situation is relatively 'stable' (maybe just > some hints being stored for nothing, and possibly not being able to add > more nodes or change schema?). This is the kind of situation where > 'rushing' a solution without understanding the impacts and risks can make > things to go terribly wrong. Take the time to analyse my suggested fix, > maybe read the ticket above etc. When you're ready, backup the data, > prepare well the DELETE command and observe how 1 node reacts to the fix > first. > > As you can see, I think it's the 'good' fix, but I'm not comfortable with > this operation. And you should not be either :). > I would say, arbitrary to share my feeling about this operation, that > there is 95% chances this does not hurt, 90% chances to fix the issue with > that, but if something goes wrong, if we are in the 5% were it does not go > well, there is a not negligible probability that you will destroy your > cluster in a very bad way. I guess I try to say be careful, watch your > step, make sure you remove the good line, ensure it works on one node with > no harm. > I shared my feeling and I would try this fix. But it's ultimately > your responsibility and I won't be behind the machine when you'll fix it. > None of us will. > > Good luck ! :) > > C*heers, > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > > > Le jeu. 4 avr. 2019 à 19:29, Kenneth Brotman <kenbrot...@yahoo.com.invalid> > a écrit : > >> Alex, >> >> According to this TLP article >> http://thelastpickle.com/blog/2018/09/18/assassinate.html : >> >> Note that the LEFT status should stick around for 72 hours to ensure all >> nodes come to the consensus that the node has been removed. So please don't >> rush things if that's the case. Again, it's only cosmetic. >> >> If a gossip state will not forget a node that was removed from the >> cluster more than a week ago: >> >> Login to each node within the Cassandra cluster. >> Download jmxterm on each node, if nodetool assassinate is not an >> option. >> Run nodetool assassinate, or the unsafeAssassinateEndpoint command, >> multiple times in quick succession. >> I typically recommend running the command 3-5 times within 2 >> seconds. >> I understand that sometimes the command takes time to return, so >> the "2 seconds" suggestion is less of a requirement than it is a mindset. >> Also, sometimes 3-5 times isn't enough. In such cases, shoot for >> the moon and try 20 assassination attempts in quick succession. >> >> What we are trying to do is to create a flood of messages requesting all >> nodes completely forget there used to be an entry within the gossip state >> for the given IP address. If each node can prune its own gossip state and >> broadcast that to the rest of the nodes, we should eliminate any race >> conditions that may exist where at least one node still remembers the given >> IP address. >> >> As soon as all nodes come to agreement that they don't remember the >> deprecated node, the cosmetic issue will no longer be a concern in any >> system.logs, nodetool describecluster commands, nor nodetool gossipinfo >> output. >> >> >> >> >> >> -----Original Message----- >> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] >> Sent: Thursday, April 04, 2019 10:40 AM >> To: user@cassandra.apache.org >> Subject: RE: Assassinate fails >> >> Alex, >> >> Did you remove the option JVM_OPTS="$JVM_OPTS >> -Dcassandra.replace_address=address_of_dead_node after the node started and >> then restart the node again? >> >> Are you sure there isn't a typo in the file? >> >> Ken >> >> >> -----Original Message----- >> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] >> Sent: Thursday, April 04, 2019 10:31 AM >> To: user@cassandra.apache.org >> Subject: RE: Assassinate fails >> >> I see; system_auth is a separate keyspace. >> >> -----Original Message----- >> From: Jon Haddad [mailto:j...@jonhaddad.com] >> Sent: Thursday, April 04, 2019 10:17 AM >> To: user@cassandra.apache.org >> Subject: Re: Assassinate fails >> >> No, it can't. As Alain (and I) have said, since the system keyspace >> is local strategy, it's not replicated, and thus can't be repaired. >> >> On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman >> <kenbrot...@yahoo.com.invalid> wrote: >> > >> > Right, could be similar issue, same type of fix though. >> > >> > -----Original Message----- >> > From: Jon Haddad [mailto:j...@jonhaddad.com] >> > Sent: Thursday, April 04, 2019 9:52 AM >> > To: user@cassandra.apache.org >> > Subject: Re: Assassinate fails >> > >> > System != system_auth. >> > >> > On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman >> > <kenbrot...@yahoo.com.invalid> wrote: >> > > >> > > From Mastering Cassandra: >> > > >> > > >> > > Forcing read repairs at consistency – ALL >> > > >> > > The type of repair isn't really part of the Apache Cassandra repair >> paradigm at all. When it was discovered that a read repair will trigger >> 100% of the time when a query is run at ALL consistency, this method of >> repair started to gain popularity in the community. In some cases, this >> method of forcing data consistency provided better results than normal, >> scheduled repairs. >> > > >> > > Let's assume, for a second, that an application team is having a hard >> time logging into a node in a new data center. You try to cqlsh out to >> these nodes, and notice that you are also experiencing intermittent >> failures, leading you to suspect that the system_auth tables might be >> missing a replica or two. On one node you do manage to connect successfully >> using cqlsh. One quick way to fix consistency on the system_auth tables is >> to set consistency to ALL, and run an unbound SELECT on every table, >> tickling each record: >> > > >> > > use system_auth ; >> > > consistency ALL; >> > > consistency level set to ALL. >> > > >> > > SELECT COUNT(*) FROM resource_role_permissons_index ; >> > > SELECT COUNT(*) FROM role_permissions ; >> > > SELECT COUNT(*) FROM role_members ; >> > > SELECT COUNT(*) FROM roles; >> > > >> > > This problem is often seen when logging in with the default cassandra >> user. Within cqlsh, there is code that forces the default cassandra user to >> connect by querying system_auth at QUORUM consistency. This can be >> problematic in larger clusters, and is another reason why you should never >> use the default cassandra user. >> > > >> > > >> > > >> > > -----Original Message----- >> > > From: Jon Haddad [mailto:j...@jonhaddad.com] >> > > Sent: Thursday, April 04, 2019 9:21 AM >> > > To: user@cassandra.apache.org >> > > Subject: Re: Assassinate fails >> > > >> > > Ken, >> > > >> > > Alain is right about the system tables. What you're describing only >> > > works on non-local tables. Changing the CL doesn't help with >> > > keyspaces that use LocalStrategy. Here's the definition of the system >> > > keyspace: >> > > >> > > CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'} >> > > AND durable_writes = true; >> > > >> > > Jon >> > > >> > > On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman >> > > <kenbrot...@yahoo.com.invalid> wrote: >> > > > >> > > > The trick below I got from the book Mastering Cassandra. You have >> to set the consistency to ALL for it to work. I thought you guys knew that >> one. >> > > > >> > > > >> > > > >> > > > From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] >> > > > Sent: Thursday, April 04, 2019 8:46 AM >> > > > To: user cassandra.apache.org >> > > > Subject: Re: Assassinate fails >> > > > >> > > > >> > > > >> > > > Hi Alex, >> > > > >> > > > >> > > > >> > > > About previous advices: >> > > > >> > > > >> > > > >> > > > You might have inconsistent data in your system tables. Try >> setting the consistency level to ALL, then do read query of system tables >> to force repair. >> > > > >> > > > >> > > > >> > > > System tables use the 'LocalStrategy', thus I don't think any >> repair would happen for the system.* tables. Regardless the consistency you >> use. It should not harm, but I really think it won't help. >> > > > >> > > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > > For additional commands, e-mail: user-h...@cassandra.apache.org >> > > >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > > For additional commands, e-mail: user-h...@cassandra.apache.org >> > > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: user-h...@cassandra.apache.org >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> > For additional commands, e-mail: user-h...@cassandra.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> > >