Hello Alain, long time - I had to wait for a quiet week to try this. I finally did, I thought I'd give you some feedback.
Short reminder: one of the nodes of my 3.9 cluster died and I replaced it. But it still appeared in nodetool status, on one node with a "null" host_id and on another with the same host_id of its replacement. nodetool assassinate failed and I could not decommission or remove any other node on the cluster. Basically, after backup and preparing another cluster in case anything went wrong, I did : DELETE FROM system.peers WHERE peer = '192.168.1.18'; and restarted cassandra on the two nodes still seeing the zombie node. After the first restart, the cassandra system.log was filled with: java.lang.NullPointerException: null WARN [MutationStage-2] 2019-08-15 15:31:44,735 AbstractLocalAwareExecutorService.java:169 - Uncaught exception on thread Thread[MutationStage-2,5,main]: So... I restarted again. The error disappeared. I ran a full repair and everything seems to be back in order. I could decommission a node without problem. Thanks for your help ! Alex Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit : > Alex, > >> Well, I tried : rolling restart did not work its magic. > > Sorry to hear and for misleading you. May faith into the rolling restart > magical power went down a bit, but I still think it was worth a try :D. > >> @ Alain : In system.peers I see both the dead node and its replacement with >> the same ID : peer | host_id >> --------------+-------------------------------------- >> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 >> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 >> >> Is it expected ? >> >> If I cannot fix this, I think I will add new nodes and remove, one by one, >> the nodes that show the dead node in nodetool status. > > Well, no. This is clearly not good or expected I would say. > > TL;DR - SUGGESTED FIX: > What I would try to fix this is the following is removing this row. It > *should* be safe but that's only my opinion and with the condition you remove > *only* the 'ghost/dead' nodes. Any mistake here would probably be costly. > Again, be aware you're on a sensitive part when messing with system tables. > Think it twice, check it twice, take a copy of the SSTables/a snapshot. Then > I would go for it and observe changes on one node first. If no harm is done, > continue to the next node. > > Considering the old node is '192.168.1.18', I would run this on all nodes > (maybe after testing on a node) to make it simple or just run it on nodes > that show the ghost node(s): > > "DELETE FROM SYSTEM.PEERS WHERE PEER = '192.168.1.18';" > > Maybe will you need to restart, I think you won't even need it. I have good > hope that this should finally fix your issue with no harm. > > MORE CONTEXT - IDEA OF THE PROBLEM: > This above, is clearly an issue I would say. Most probably the source of your > troubles here. The problem is that I lack understanding. From where I stand, > this kind of bugs should not happen anymore in Cassandra (I did not see > anything similar for a while). > > I would blame: > - A corner case scenario (unlikely, system tables are rather solid for a > while). Or maybe are you using an old C* version. It *might* be related to > this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122) > - A really weird operation (A succession of action might have put you in this > state, but hard for me to say what) > - KairosDB? I don't know It or what it does. Might it be less reliable than > Cassandra is, and have lead to this issue? Maybe, I have no clue once again. > > RISK OF THIS OPERATION AND CURRENT SITUATION: > Also, I *think* the current situation is relatively 'stable' (maybe just some > hints being stored for nothing, and possibly not being able to add more nodes > or change schema?). This is the kind of situation where 'rushing' a solution > without understanding the impacts and risks can make things to go terribly > wrong. Take the time to analyse my suggested fix, maybe read the ticket above > etc. When you're ready, backup the data, prepare well the DELETE command and > observe how 1 node reacts to the fix first. > > As you can see, I think it's the 'good' fix, but I'm not comfortable with > this operation. And you should not be either :). > I would say, arbitrary to share my feeling about this operation, that there > is 95% chances this does not hurt, 90% chances to fix the issue with that, > but if something goes wrong, if we are in the 5% were it does not go well, > there is a not negligible probability that you will destroy your cluster in a > very bad way. I guess I try to say be careful, watch your step, make sure you > remove the good line, ensure it works on one node with no harm. > I shared my feeling and I would try this fix. But it's ultimately your > responsibility and I won't be behind the machine when you'll fix it. None of > us will. > > Good luck ! :) > > C*heers, > > ----------------------- > Alain Rodriguez - al...@thelastpickle.com > France / Spain > > The Last Pickle - Apache Cassandra Consulting > http://www.thelastpickle.com > > Le jeu. 4 avr. 2019 à 19:29, Kenneth Brotman <kenbrot...@yahoo.com.invalid> a > écrit : > >> Alex, >> >> According to this TLP article >> http://thelastpickle.com/blog/2018/09/18/assassinate.html : >> >> Note that the LEFT status should stick around for 72 hours to ensure all >> nodes come to the consensus that the node has been removed. So please don't >> rush things if that's the case. Again, it's only cosmetic. >> >> If a gossip state will not forget a node that was removed from the cluster >> more than a week ago: >> >> Login to each node within the Cassandra cluster. >> Download jmxterm on each node, if nodetool assassinate is not an option. >> Run nodetool assassinate, or the unsafeAssassinateEndpoint command, multiple >> times in quick succession. >> I typically recommend running the command 3-5 times within 2 seconds. >> I understand that sometimes the command takes time to return, so the "2 >> seconds" suggestion is less of a requirement than it is a mindset. >> Also, sometimes 3-5 times isn't enough. In such cases, shoot for the moon >> and try 20 assassination attempts in quick succession. >> >> What we are trying to do is to create a flood of messages requesting all >> nodes completely forget there used to be an entry within the gossip state >> for the given IP address. If each node can prune its own gossip state and >> broadcast that to the rest of the nodes, we should eliminate any race >> conditions that may exist where at least one node still remembers the given >> IP address. >> >> As soon as all nodes come to agreement that they don't remember the >> deprecated node, the cosmetic issue will no longer be a concern in any >> system.logs, nodetool describecluster commands, nor nodetool gossipinfo >> output. >> >> -----Original Message----- >> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] >> Sent: Thursday, April 04, 2019 10:40 AM >> To: user@cassandra.apache.org >> Subject: RE: Assassinate fails >> >> Alex, >> >> Did you remove the option JVM_OPTS="$JVM_OPTS >> -Dcassandra.replace_address=address_of_dead_node after the node started and >> then restart the node again? >> >> Are you sure there isn't a typo in the file? >> >> Ken >> >> -----Original Message----- >> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] >> Sent: Thursday, April 04, 2019 10:31 AM >> To: user@cassandra.apache.org >> Subject: RE: Assassinate fails >> >> I see; system_auth is a separate keyspace. >> >> -----Original Message----- >> From: Jon Haddad [mailto:j...@jonhaddad.com] >> Sent: Thursday, April 04, 2019 10:17 AM >> To: user@cassandra.apache.org >> Subject: Re: Assassinate fails >> >> No, it can't. As Alain (and I) have said, since the system keyspace >> is local strategy, it's not replicated, and thus can't be repaired. >> >> On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman >> <kenbrot...@yahoo.com.invalid> wrote: >>> >>> Right, could be similar issue, same type of fix though. >>> >>> -----Original Message----- >>> From: Jon Haddad [mailto:j...@jonhaddad.com] >>> Sent: Thursday, April 04, 2019 9:52 AM >>> To: user@cassandra.apache.org >>> Subject: Re: Assassinate fails >>> >>> System != system_auth. >>> >>> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman >>> <kenbrot...@yahoo.com.invalid> wrote: >>>> >>>> From Mastering Cassandra: >>>> >>>> >>>> Forcing read repairs at consistency - ALL >>>> >>>> The type of repair isn't really part of the Apache Cassandra repair >>>> paradigm at all. When it was discovered that a read repair will trigger >>>> 100% of the time when a query is run at ALL consistency, this method of >>>> repair started to gain popularity in the community. In some cases, this >>>> method of forcing data consistency provided better results than normal, >>>> scheduled repairs. >>>> >>>> Let's assume, for a second, that an application team is having a hard time >>>> logging into a node in a new data center. You try to cqlsh out to these >>>> nodes, and notice that you are also experiencing intermittent failures, >>>> leading you to suspect that the system_auth tables might be missing a >>>> replica or two. On one node you do manage to connect successfully using >>>> cqlsh. One quick way to fix consistency on the system_auth tables is to >>>> set consistency to ALL, and run an unbound SELECT on every table, tickling >>>> each record: >>>> >>>> use system_auth ; >>>> consistency ALL; >>>> consistency level set to ALL. >>>> >>>> SELECT COUNT(*) FROM resource_role_permissons_index ; >>>> SELECT COUNT(*) FROM role_permissions ; >>>> SELECT COUNT(*) FROM role_members ; >>>> SELECT COUNT(*) FROM roles; >>>> >>>> This problem is often seen when logging in with the default cassandra >>>> user. Within cqlsh, there is code that forces the default cassandra user >>>> to connect by querying system_auth at QUORUM consistency. This can be >>>> problematic in larger clusters, and is another reason why you should never >>>> use the default cassandra user. >>>> >>>> >>>> >>>> -----Original Message----- >>>> From: Jon Haddad [mailto:j...@jonhaddad.com] >>>> Sent: Thursday, April 04, 2019 9:21 AM >>>> To: user@cassandra.apache.org >>>> Subject: Re: Assassinate fails >>>> >>>> Ken, >>>> >>>> Alain is right about the system tables. What you're describing only >>>> works on non-local tables. Changing the CL doesn't help with >>>> keyspaces that use LocalStrategy. Here's the definition of the system >>>> keyspace: >>>> >>>> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'} >>>> AND durable_writes = true; >>>> >>>> Jon >>>> >>>> On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman >>>> <kenbrot...@yahoo.com.invalid> wrote: >>>>> >>>>> The trick below I got from the book Mastering Cassandra. You have to set >>>>> the consistency to ALL for it to work. I thought you guys knew that one. >>>>> >>>>> >>>>> >>>>> From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] >>>>> Sent: Thursday, April 04, 2019 8:46 AM >>>>> To: user cassandra.apache.org [1] >>>>> Subject: Re: Assassinate fails >>>>> >>>>> >>>>> >>>>> Hi Alex, >>>>> >>>>> >>>>> >>>>> About previous advices: >>>>> >>>>> >>>>> >>>>> You might have inconsistent data in your system tables. Try setting the >>>>> consistency level to ALL, then do read query of system tables to force >>>>> repair. >>>>> >>>>> >>>>> >>>>> System tables use the 'LocalStrategy', thus I don't think any repair >>>>> would happen for the system.* tables. Regardless the consistency you use. >>>>> It should not harm, but I really think it won't help. >>>>> >>>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: user-h...@cassandra.apache.org >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >> For additional commands, e-mail: user-h...@cassandra.apache.org Links: ------ [1] http://cassandra.apache.org