Re: Assassinate fails

Alex Fri, 16 Aug 2019 00:16:57 -0700

Hello Alain, 

long time  - I had to wait for a quiet week to try this. I finally did,
I thought I'd give you some feedback.


Short reminder: one of the nodes of my 3.9 cluster died and I replaced
it. But it still appeared in nodetool status, on one node with a "null"
host_id and on another with the same host_id of its replacement.
nodetool assassinate failed and I could not decommission or remove any
other node on the cluster. 

Basically, after backup and preparing another cluster in case anything
went wrong, I did : 

DELETE FROM system.peers WHERE peer = '192.168.1.18'; 

and restarted cassandra on the two nodes still seeing the zombie node. 

After the first restart, the cassandra system.log was filled with: 

java.lang.NullPointerException: null
WARN  [MutationStage-2] 2019-08-15 15:31:44,735
AbstractLocalAwareExecutorService.java:169 - Uncaught exception on
thread Thread[MutationStage-2,5,main]: 

So... I restarted again. The error disappeared. I ran a full repair and
everything seems to be back in order. I could decommission a node
without problem. 

Thanks for your help ! 

Alex 

Le 05.04.2019 10:55, Alain RODRIGUEZ a écrit :

> Alex, 
> 
>> Well, I tried : rolling restart did not work its magic.
> 
> Sorry to hear and for misleading you. May faith into the rolling restart 
> magical power went down a bit, but I still think it was worth a try :D. 
> 
>> @ Alain : In system.peers I see both the dead node and its replacement with 
>> the same ID :    peer         | host_id
>> --------------+--------------------------------------
>> 192.168.1.18 | 09d24557-4e98-44c3-8c9d-53c4c31066e1
>> 192.168.1.22 | 09d24557-4e98-44c3-8c9d-53c4c31066e1 
>> 
>> Is it expected ? 
>> 
>> If I cannot fix this, I think I will add new nodes and remove, one by one, 
>> the nodes that show the dead node in nodetool status.
> 
> Well, no. This is clearly not good or expected I would say. 
> 
> TL;DR - SUGGESTED FIX: 
> What I would try to fix this is the following is removing this row. It 
> *should* be safe but that's only my opinion and with the condition you remove 
> *only* the 'ghost/dead' nodes. Any mistake here would probably be costly. 
> Again, be aware you're on a sensitive part when messing with system tables. 
> Think it twice, check it twice, take a copy of the SSTables/a snapshot. Then 
> I would go for it and observe changes on one node first. If no harm is done, 
> continue to the next node. 
> 
> Considering the old node is '192.168.1.18', I would run this on all nodes 
> (maybe after testing on a node) to make it simple or just run it on nodes 
> that show the ghost node(s):  
> 
> "DELETE FROM SYSTEM.PEERS WHERE PEER = '192.168.1.18';" 
> 
> Maybe will you need to restart, I think you won't even need it. I have good 
> hope that this should finally fix your issue with no harm. 
> 
> MORE CONTEXT - IDEA OF THE PROBLEM: 
> This above, is clearly an issue I would say. Most probably the source of your 
> troubles here. The problem is that I lack understanding. From where I stand, 
> this kind of bugs should not happen anymore in Cassandra (I did not see 
> anything similar for a while). 
> 
> I would blame: 
> - A corner case scenario (unlikely, system tables are rather solid for a 
> while). Or maybe are you using an old C* version. It *might* be related to 
> this (or similar): https://issues.apache.org/jira/browse/CASSANDRA-7122) 
> - A really weird operation (A succession of action might have put you in this 
> state, but hard for me to say what) 
> - KairosDB? I don't know It or what it does. Might it be less reliable than 
> Cassandra is, and have lead to this issue? Maybe, I have no clue once again. 
> 
> RISK OF THIS OPERATION AND CURRENT SITUATION: 
> Also, I *think* the current situation is relatively 'stable' (maybe just some 
> hints being stored for nothing, and possibly not being able to add more nodes 
> or change schema?). This is the kind of situation where 'rushing' a solution 
> without understanding the impacts and risks can make things to go terribly 
> wrong. Take the time to analyse my suggested fix, maybe read the ticket above 
> etc. When you're ready, backup the data, prepare well the DELETE command and 
> observe how 1 node reacts to the fix first. 
> 
> As you can see, I think it's the 'good' fix, but I'm not comfortable with 
> this operation. And you should not be either :). 
> I would say, arbitrary to share my feeling about this operation, that there 
> is 95% chances this does not hurt, 90% chances to fix the issue with that, 
> but if something goes wrong, if we are in the 5% were it does not go well, 
> there is a not negligible probability that you will destroy your cluster in a 
> very bad way. I guess I try to say be careful, watch your step, make sure you 
> remove the good line, ensure it works on one node with no harm. 
> I shared my feeling and I would try this fix. But it's ultimately your 
> responsibility and I won't be behind the machine when you'll fix it. None of 
> us will. 
> 
> Good luck ! :) 
> 
> C*heers, 
> 
> ----------------------- 
> Alain Rodriguez - al...@thelastpickle.com 
> France / Spain 
> 
> The Last Pickle - Apache Cassandra Consulting 
> http://www.thelastpickle.com 
> 
> Le jeu. 4 avr. 2019 à 19:29, Kenneth Brotman <kenbrot...@yahoo.com.invalid> a 
> écrit : 
> 
>> Alex,
>> 
>> According to this TLP article 
>> http://thelastpickle.com/blog/2018/09/18/assassinate.html :
>> 
>> Note that the LEFT status should stick around for 72 hours to ensure all 
>> nodes come to the consensus that the node has been removed. So please don't 
>> rush things if that's the case. Again, it's only cosmetic.
>> 
>> If a gossip state will not forget a node that was removed from the cluster 
>> more than a week ago:
>> 
>> Login to each node within the Cassandra cluster.
>> Download jmxterm on each node, if nodetool assassinate is not an option.
>> Run nodetool assassinate, or the unsafeAssassinateEndpoint command, multiple 
>> times in quick succession.
>> I typically recommend running the command 3-5 times within 2 seconds.
>> I understand that sometimes the command takes time to return, so the "2 
>> seconds" suggestion is less of a requirement than it is a mindset.
>> Also, sometimes 3-5 times isn't enough. In such cases, shoot for the moon 
>> and try 20 assassination attempts in quick succession.
>> 
>> What we are trying to do is to create a flood of messages requesting all 
>> nodes completely forget there used to be an entry within the gossip state 
>> for the given IP address. If each node can prune its own gossip state and 
>> broadcast that to the rest of the nodes, we should eliminate any race 
>> conditions that may exist where at least one node still remembers the given 
>> IP address.
>> 
>> As soon as all nodes come to agreement that they don't remember the 
>> deprecated node, the cosmetic issue will no longer be a concern in any 
>> system.logs, nodetool describecluster commands, nor nodetool gossipinfo 
>> output.
>> 
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
>> Sent: Thursday, April 04, 2019 10:40 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>> 
>> Alex,
>> 
>> Did you remove the option JVM_OPTS="$JVM_OPTS 
>> -Dcassandra.replace_address=address_of_dead_node after the node started and 
>> then restart the node again?
>> 
>> Are you sure there isn't a typo in the file?
>> 
>> Ken
>> 
>> -----Original Message-----
>> From: Kenneth Brotman [mailto:kenbrot...@yahoo.com.INVALID] 
>> Sent: Thursday, April 04, 2019 10:31 AM
>> To: user@cassandra.apache.org
>> Subject: RE: Assassinate fails
>> 
>> I see; system_auth is a separate keyspace.    
>> 
>> -----Original Message-----
>> From: Jon Haddad [mailto:j...@jonhaddad.com] 
>> Sent: Thursday, April 04, 2019 10:17 AM
>> To: user@cassandra.apache.org
>> Subject: Re: Assassinate fails
>> 
>> No, it can't.  As Alain (and I) have said, since the system keyspace
>> is local strategy, it's not replicated, and thus can't be repaired.
>> 
>> On Thu, Apr 4, 2019 at 9:54 AM Kenneth Brotman
>> <kenbrot...@yahoo.com.invalid> wrote:
>>> 
>>> Right, could be similar issue, same type of fix though.
>>> 
>>> -----Original Message-----
>>> From: Jon Haddad [mailto:j...@jonhaddad.com]
>>> Sent: Thursday, April 04, 2019 9:52 AM
>>> To: user@cassandra.apache.org
>>> Subject: Re: Assassinate fails
>>> 
>>> System != system_auth.
>>> 
>>> On Thu, Apr 4, 2019 at 9:43 AM Kenneth Brotman
>>> <kenbrot...@yahoo.com.invalid> wrote:
>>>> 
>>>> From Mastering Cassandra:
>>>> 
>>>> 
>>>> Forcing read repairs at consistency - ALL
>>>> 
>>>> The type of repair isn't really part of the Apache Cassandra repair 
>>>> paradigm at all. When it was discovered that a read repair will trigger 
>>>> 100% of the time when a query is run at ALL consistency, this method of 
>>>> repair started to gain popularity in the community. In some cases, this 
>>>> method of forcing data consistency provided better results than normal, 
>>>> scheduled repairs.
>>>> 
>>>> Let's assume, for a second, that an application team is having a hard time 
>>>> logging into a node in a new data center. You try to cqlsh out to these 
>>>> nodes, and notice that you are also experiencing intermittent failures, 
>>>> leading you to suspect that the system_auth tables might be missing a 
>>>> replica or two. On one node you do manage to connect successfully using 
>>>> cqlsh. One quick way to fix consistency on the system_auth tables is to 
>>>> set consistency to ALL, and run an unbound SELECT on every table, tickling 
>>>> each record:
>>>> 
>>>> use system_auth ;
>>>> consistency ALL;
>>>> consistency level set to ALL.
>>>> 
>>>> SELECT COUNT(*) FROM resource_role_permissons_index ;
>>>> SELECT COUNT(*) FROM role_permissions ;
>>>> SELECT COUNT(*) FROM role_members ;
>>>> SELECT COUNT(*) FROM roles;
>>>> 
>>>> This problem is often seen when logging in with the default cassandra 
>>>> user. Within cqlsh, there is code that forces the default cassandra user 
>>>> to connect by querying system_auth at QUORUM consistency. This can be 
>>>> problematic in larger clusters, and is another reason why you should never 
>>>> use the default cassandra user.
>>>> 
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Jon Haddad [mailto:j...@jonhaddad.com]
>>>> Sent: Thursday, April 04, 2019 9:21 AM
>>>> To: user@cassandra.apache.org
>>>> Subject: Re: Assassinate fails
>>>> 
>>>> Ken,
>>>> 
>>>> Alain is right about the system tables.  What you're describing only
>>>> works on non-local tables.  Changing the CL doesn't help with
>>>> keyspaces that use LocalStrategy.  Here's the definition of the system
>>>> keyspace:
>>>> 
>>>> CREATE KEYSPACE system WITH replication = {'class': 'LocalStrategy'}
>>>> AND durable_writes = true;
>>>> 
>>>> Jon
>>>> 
>>>> On Thu, Apr 4, 2019 at 9:03 AM Kenneth Brotman
>>>> <kenbrot...@yahoo.com.invalid> wrote:
>>>>>
>>>>> The trick below I got from the book Mastering Cassandra.  You have to set 
>>>>> the consistency to ALL for it to work. I thought you guys knew that one.
>>>>>
>>>>>
>>>>>
>>>>> From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
>>>>> Sent: Thursday, April 04, 2019 8:46 AM
>>>>> To: user cassandra.apache.org [1]
>>>>> Subject: Re: Assassinate fails
>>>>>
>>>>>
>>>>>
>>>>> Hi Alex,
>>>>>
>>>>>
>>>>>
>>>>> About previous advices:
>>>>>
>>>>>
>>>>>
>>>>> You might have inconsistent data in your system tables.  Try setting the 
>>>>> consistency level to ALL, then do read query of system tables to force 
>>>>> repair.
>>>>>
>>>>>
>>>>>
>>>>> System tables use the 'LocalStrategy', thus I don't think any repair 
>>>>> would happen for the system.* tables. Regardless the consistency you use. 
>>>>> It should not harm, but I really think it won't help.
>>>>>
>>>>>
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>> 
>>>> 
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>>> 
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>>> For additional commands, e-mail: user-h...@cassandra.apache.org
>>> 
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
>> For additional commands, e-mail: user-h...@cassandra.apache.org

 

Links:
------
[1] http://cassandra.apache.org

Re: Assassinate fails

Reply via email to