[ 
https://issues.apache.org/jira/browse/CASSANDRA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jaakko Laine updated CASSANDRA-634:
-----------------------------------

    Attachment: 634-1st-part-gossip-about-all-nodes.patch

The way node's alive/dead status gets determined currently is as follows:

1st time there is gossip about certain node:
- add new node to endpoint state and mark it alive (onAlive will be called)
- call onJoin

2nd and subsequent gossip about this node
- notify failuredetector whenever there is gossip about this endpoint -> 
failuredetector starts to monitor this node and set node's status dead if 
needed (it will not set it to alive)
- node is marked alive whenever there is gossip about it

The important things here are: (1) node is assumed to be alive when 1st info 
about it arrives and (2) failuredetector does not know anything about the node 
before 2nd gossip. That means we cannot simply start gossiping info about dead 
nodes, as their status would remain "alive" forever (that is, until the dead 
node comes online and activates failure detector)

Proposed fix (patch attached):

1st time gossip:
- add new node to endpoint state, but set its status as dead
- call onJoin (token metadata will be updated)

2nd and subsequent gossips:
- Unchanged. This 2nd gossip will trigger markAlive (and call onAlive) and 
activate failuredetector -> normal situation

In short: assume node to be dead unless otherwise proven by subsequent gossip. 
If the node is alive, it will be marked so within seconds. If it is dead, we 
have knowledge about its existence, but we consider it (correctly) to be dead.

There is a possibility of false "alive" interpretation, though: Cluster has 
nodes A, B and C. Suppose C has just gossiped to B and dies. At this time C's 
status in A is different (older) than in B. Now suppose at this instant node D 
enters the cluster and first gossips with A. In this case D will get the old 
gossip and only later the new one. This second newer gossip will cause C to be 
marked alive even though it is already dead. However, since second gossip will 
also activate failure detector, C will be correctly marked as dead in a few 
seconds, so this is probably OK (and anyway a very rare occurence).

Two open issues:
- Now that we're gossiping about dead nodes as well, gossip digest continues to 
grow without boundary when nodes come and go. This information will never 
disappear as it will be propagated to new nodes no matter how old and obsolete 
it is. To counter this, we need some mechanism to (1) either remove dead node 
from endpointstateinfo or (2) at some point stop to gossip about it, or both.

For (1): when we get removetoken command, it is probably safe to remove the 
endpoint immediately (STATE_LEFT is broadcasted by different endpoint, so info 
about token removal will remain in the gossiper). Another thing we could do is 
to keep track of nodes that have left. If nothing is heard about it for some 
time, we could assume that it is gone for good and remove it from gossiper 
after giving its STATE_LEFT enough time to spread.

For (2): We could gossip info only about nodes in either liveEndpoints or 
unreachableEndpoints (as opposed to endPointStateMap). Nodes are removed from 
unreachableEndpoints after three days of silence, so this would discard old 
information from the gossiper. Side effect would of course be that a node that 
is down more than three days but comes back later, might miss some of its data 
(new nodes that booted after the three day period would know nothing about this 
node).

(Attached patch should work as such, but does not take into account these last 
two issues)


> Hinted Handoff Exception
> ------------------------
>
>                 Key: CASSANDRA-634
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-634
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.5
>            Reporter: Chris Goffinet
>             Fix For: 0.5
>
>         Attachments: 634-1st-part-gossip-about-all-nodes.patch
>
>
> Updated to the latest codebase from cassandra-0.5 branch. All nodes booted up 
> fine and then I start noticing this error:
> ERROR [HINTED-HANDOFF-POOL:1] 2009-12-14 22:05:34,191 CassandraDaemon.java 
> (line 71) Fatal exception in thread Thread[HINTED-HANDOFF-POOL:1,5,main]
> java.lang.RuntimeException: java.lang.NullPointerException
>         at 
> org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:253)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.cassandra.gms.FailureDetector.isAlive(FailureDetector.java:146)
>         at 
> org.apache.cassandra.db.HintedHandOffManager.sendMessage(HintedHandOffManager.java:106)
>         at 
> org.apache.cassandra.db.HintedHandOffManager.deliverAllHints(HintedHandOffManager.java:177)
>         at 
> org.apache.cassandra.db.HintedHandOffManager.access$000(HintedHandOffManager.java:75)
>         at 
> org.apache.cassandra.db.HintedHandOffManager$1.run(HintedHandOffManager.java:249)
>         ... 3 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to