[ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14934733#comment-14934733
 ] 

Joel Knighton commented on CASSANDRA-10231:
-------------------------------------------

I've attached the logs for n1, n2, n3, n4, and n5. n1 is at 10.0.0.2, n2 is at 
10.0.0.3, and so on.

The decommission node is n2. The node with the null status entry is n5. This 
status entry looks like 

Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load       Tokens       Owns    Host ID                           
    Rack
UN  10.0.0.2  480.16 KB  256          ?       
7a7681f5-0a22-4ba2-89c4-17c84658a18f  rack1
?N  10.0.0.3  ?          256          ?       null                              
    rack1
UN  10.0.0.4  495.24 KB  256          ?       
ef529827-e178-49f8-ad3a-458198df5060  rack1
UN  10.0.0.5  374.78 KB  256          ?       
ee63423d-1204-496e-b53d-d318472717ab  rack1
UN  10.0.0.6  456.69 KB  256          ?       
d88d166b-ed03-4b48-a12e-ea849f680920  rack1

As I mentioned last week, I'm tracking down an MV issue that causes a failure 
in the tests before they would reach this point on 3.0. In order to accommodate 
this, I applied your patch to commit e5c14285404b1ba98d385c5e5ed069229a2f6004, 
which is the commit in which I originally produced the issue.

Sorry for the delay.

> Null status entries on nodes that crash during decommission of a different 
> node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to