[jira] Commented: (CASSANDRA-768) "safe mode" for nodes so that they do not participate in reads until hinted handoff is complete

philo vivero (JIRA) Fri, 05 Feb 2010 14:13:00 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12830325#action_12830325
 ]


philo vivero commented on CASSANDRA-768:
----------------------------------------

Consider a node N(i) goes down for a protracted length of time, then comes back 
up. The node has significant amounts of (i)nconsistent data.

We propose that in the special case of a reader that has specified R=1 the 
following:

The node N(i) in absence of any information otherwise (ie: does not know it is 
N(i) but thinks it is regular node with nearly up-to-date information) simply 
answers the request.

At some point, however, other nodes in the cluster notify this node that it is, 
in fact, N(i) and that it has HH queues that need application.

The first node tells N(i) that it has X transactions that need to be applied to 
be up-to-date and streams the updates to it. N(i) notes Q+=X and 
Q(t)=[timestamp of now].

Another node also tells N(i) about Y transactions and streams the updates. N(i) 
adds Q+=Y.

After some time T, if Q(t)+T>[HH queue notification timeout] N(i) assumes it 
has all the queued HH information it needs to know. Any further communiques 
from other nodes about N(i) needing HH queue updates are ignored.

Now N(i) notes Q queued transactions must be applied to be "reasonably 
current." N(i) sets Q0(t)=[timestamp of now].

While N(i) has Q>0 transactions streaming to it from other nodes, if a reader 
comes to N(i) asking for a piece of data, N(i) will, with a short timeout 
("short" should be tuneable), attempt to get the data value from another node 
that has the data. If the other node doesn't answer within the timeout, N(i) 
answers as through authoritative. It might behoove N(i) to tell the other node 
to answer even if it also considers itself (i)nconsistent, to avoid a cascading 
failure.

After some time T, if Q does not reduce to zero before Q0(t)+T>[queue reduction 
timeout], then N(i) assumes now it is no longer inconsistent and changes its 
state to normal. That is, it may know other HH queue data hasn't arrived, but 
sorry, timeout passed. Don't care anymore.

It is our assertion that this will significantly reduce the amount of stale 
data served to readers who have specified R=1 on their read request though 
certainly the reader cannot count on this, especially if various timeouts are 
configurable by the administrator.


> "safe mode" for nodes so that they do not participate in reads until hinted 
> handoff is complete
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-768
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-768
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Robert Coli
>            Priority: Minor
>
> Summary :
> When using ConsistencyLevel.ONE for read performance reasons, stale data can 
> be served by a node which has been temporarily unavailable. When a node has 
> been unavailable for some time and other nodes have queued updates for it via 
> hinted handoff, it would be operationally useful to be able to configure the 
> node to not participate in read traffic until the hinted handoff process is 
> complete. A "safe mode" would offer operators a greater consistency guarantee 
> across all nodes without the per-read performance tradeoff of a higher 
> ConsistencyLevel. This "safe mode" concept might also be applicable to nodes 
> undergoing "repair" processes, for example in the case of on-disk data 
> corruption or loss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (CASSANDRA-768) "safe mode" for nodes so that they do not participate in reads until hinted handoff is complete

Reply via email to