[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Stefania (JIRA) Tue, 08 Sep 2015 03:34:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734586#comment-14734586
 ]


Stefania commented on CASSANDRA-10231:
--------------------------------------

This is not going to be easy to reproduce with a dtest, not without injecting 
some failure into the code. So far I was able to see this interesting 
transition by issuing repeated nodetool status commands during a decommission - 
but I was very lucky as I only saw it once out of several times:

{code}
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                          
     Rack
UL  127.0.0.1  57.39 KB   256          ?       
1b91a92c-58b7-470f-82eb-f1e05fc50636  rack1
UN  127.0.0.2  90.56 KB   256          ?       
4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       
35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective 
ownership information is meaningless

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                          
     Rack
UL  127.0.0.1  57.39 KB   256          ?       null                             
     rack1
UN  127.0.0.2  90.56 KB   256          ?       
4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       
35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective 
ownership information is meaningless

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                          
     Rack
UN  127.0.0.2  90.56 KB   256          ?       
4287fd68-e53d-4b9e-a48b-af374f9e69b3  rack1
UN  127.0.0.3  52.56 KB   256          ?       
35a94edb-b38a-4bf3-8318-e14bb8a59eef  rack1

Note: Non-system keyspaces don't have the same replication settings, effective 
ownership information is meaningless
{code}

Because of this observed transition, we know that at some point during the 
decomission the host id must be null. That means it must be updated as null in 
{{system.peers}}. My assumption was that if the node crashes when the host id 
is null in {{system.peers}} but before the entry is removed entirely, this 
behavior might be observed. So I patched the C* code not to save host id in 
system peers, and when I did I got this, which is close but not identical:

{code}
Final status from node 2
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving  
--  Address    Load       Tokens       Owns    Host ID                          
     Rack
UL  127.0.0.1  63.71 KB   256          ?       null                             
     rack1
UN  127.0.0.2  102.39 KB  256          ?       
c897de6b-9ec8-4fe2-9835-60bf812c0b22  rack1
{code}

I also saw this exception:

{code}
ERROR [GossipStage:1] 2015-09-08 18:10:22,590 CassandraDaemon.java:191 - 
Exception in thread Thread[GossipStage:1,5,main]
java.lang.NullPointerException: null
        at 
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) 
~[na:1.8.0_60]
        at org.apache.cassandra.hints.HintsCatalog.get(HintsCatalog.java:85) 
~[main/:na]
        at 
org.apache.cassandra.hints.HintsService.excise(HintsService.java:267) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageService.excise(StorageService.java:2129) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageService.excise(StorageService.java:2141) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:2046)
 ~[main/:na]
        at 
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1660) 
~[main/:na]
        at 
org.apache.cassandra.gms.Gossiper.doOnChangeNotifications(Gossiper.java:1191) 
~[main/:na]
        at org.apache.cassandra.gms.Gossiper.applyNewStates(Gossiper.java:1173) 
~[main/:na]
        at 
org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1130) 
~[main/:na]
        at 
org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
 ~[main/:na]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
~[main/:na]
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
~[na:1.8.0_60]
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
~[na:1.8.0_60]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_60]
{code}

Here is the [wip 
dtest|https://github.com/stef1927/cassandra-dtest/commits/10231] but it only 
works by changing the C* source code as follows:

{code}
stefi@lila:~/git/cstar/cassandra$ git diff
diff --git a/src/java/org/apache/cassandra/service/StorageService.java 
b/src/java/org/apache/cassandra/service/StorageService.java
index 2d9bbec..b84bcf5 100644
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@ -1701,7 +1701,7 @@ public class StorageService extends 
NotificationBroadcasterSupport implements IE
                         MigrationManager.instance.scheduleSchemaPull(endpoint, 
epState);
                         break;
                     case HOST_ID:
-                        SystemKeyspace.updatePeerInfo(endpoint, "host_id", 
UUID.fromString(value.value));
+                        //SystemKeyspace.updatePeerInfo(endpoint, "host_id", 
UUID.fromString(value.value));
                         break;
                     case RPC_READY:
                         notifyRpcChange(endpoint, epState.isRpcReady());
@@ -1741,7 +1741,7 @@ public class StorageService extends 
NotificationBroadcasterSupport implements IE
                     SystemKeyspace.updatePeerInfo(endpoint, "schema_version", 
UUID.fromString(entry.getValue().value));
                     break;
                 case HOST_ID:
-                    SystemKeyspace.updatePeerInfo(endpoint, "host_id", 
UUID.fromString(entry.getValue().value));
+                    //SystemKeyspace.updatePeerInfo(endpoint, "host_id", 
UUID.fromString(entry.getValue().value));
                     break;
             }
         }
{code}

This code in {{SS.initServer()}} is suspect:

{code}
        if 
(Boolean.parseBoolean(System.getProperty("cassandra.load_ring_state", "true")))
        {
            logger.info("Loading persisted ring state");
            Multimap<InetAddress, Token> loadedTokens = 
SystemKeyspace.loadTokens();
            Map<InetAddress, UUID> loadedHostIds = SystemKeyspace.loadHostIds();
            for (InetAddress ep : loadedTokens.keySet())
            {
                if (ep.equals(FBUtilities.getBroadcastAddress()))
                {
                    // entry has been mistakenly added, delete it
                    SystemKeyspace.removeEndpoint(ep);
                }
                else
                {
                    if (loadedHostIds.containsKey(ep))
                        tokenMetadata.updateHostId(loadedHostIds.get(ep), ep);
                    Gossiper.instance.addSavedEndpoint(ep);
                }
            }
        }
{code}

The EP is added even when there is no host id so this might explain the problem 
but I still need to investigate further.


> Null status entries on nodes that crash during decommission of a different 
> node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.x
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Reply via email to