[ 
https://issues.apache.org/jira/browse/CASSANDRA-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-3830:
----------------------------------------
    Component/s:     (was: Streaming and Messaging)
                 Distributed Metadata

> gossip-to-seeds is not obviously independent of failure detection algorithm 
> ----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3830
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3830
>             Project: Cassandra
>          Issue Type: Task
>          Components: Distributed Metadata
>            Reporter: Peter Schuller
>            Assignee: Peter Schuller
>            Priority: Minor
>
> The failure detector, ignoring all the theory, boils down to an
> extremely simple algorithm. The FD keeps track of a sliding window (of
> 1000 currently) intervals of heartbeat for a given host. Meaning, we
> have a track record of the last 1000 times we saw an updated heartbeat
> for a host.
> At any given moment, a host has a score which is simply the time since
> the last heartbeat, over the *mean* interval in the sliding
> window. For historical reasons a simple scaling factor is applied to
> this prior to checking the phi conviction threshold.
> (CASSANDRA-2597 has details, but thanks to Paul's work there it's now
> trivial to understand what it does based on gut feeling)
> So in effect, a host is considered down if we haven't heard from it in
> some time which is significantly longer than the "average" time we
> expect to hear from it.
> This seems reasonable, but it does assume that under normal conditions
> the average time between heartbeats does not change for reasons other
> than those that would be plausible reasons to think a node is
> unhealthy.
> This assumption *could* be violated by the gossip-to-seed
> feature. There is an argument to avoid gossip-to-seed for other
> reasons (see CASSANDRA-3829), but this is a concrete case in which the
> gossip-to-seed could cause a negative side-effect of the general kind
> mentioned in CASSANDRA-3829 (see notes at end about not case w/o seeds
> not being continuously tested). Normally, due to gossip to seed,
> everyone essentially sees latest information within very few hart
> beats (assuming only 2-3 seeds). But should all seeds be down,
> suddenly we flip a switch and start relying on generalized propagation
> in the gossip system, rather than the seed special case.
> The potential problem I forese here is that if the average propagation
> time suddenly spikes when all seeds become available, it could cause
> bogus flapping of nodes into down state.
> In order to test this, I deployeda ~ 180 node cluster with a version
> that logs heartbet information on each interpret(), similar to:
>  INFO [GossipTasks:1] 2012-02-01 23:29:58,746 FailureDetector.java (line 187) 
> ep /XXX.XXX.XXX.XXX is at phi 0.0019521638443084342, last interval 7.0, mean 
> is 1557.2777777777778
> It turns out that, at least at 180 nodes, with 4 seed nodes, whether
> or not seeds are running *does not* seem to matter significantly. In
> both cases, the mean interval is around 1500 milliseconds.
> I don't feel I have a good grasp of whether this is incidental or
> guaranteed, and it would be good to at least empirically test
> propagation time w/o seeds at differnet cluster sizes; it's supposed
> to be un-affected by cluster size ({{RING_DELAY}} is static for this
> reason, is my understanding). Would be nice to see this be the case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to