It seems that you're correct in saying that writes don't propagate to a
node that has join_ring set to false, so I'd say this is a flaw. In reality
I can't see many actual use cases in regards to node outages with the
current implementation. The main usage I'd think would be to have
additional coordinators for CPU heavy workloads.

It seems to make it actually useful for repairs/outages we'd need to have
another option to turn on writes so that it behaved similarly to write
survey mode (but on already bootstrapped nodes).

Is there a reason we don't have this already? Or does it exist somewhere
I'm not aware of?

On 20 December 2016 at 17:40, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:

> No responses yet :)
>
> Any C* expert who could help on join_ring use case and the concern raised?
>
> Thanks
> Anuj
>
> On Tue, 13 Dec, 2016 at 11:31 PM, Anuj Wadehra
> <anujw_2...@yahoo.co.in> wrote:
> Hi,
>
> I need to understand the use case of join_ring=false in case of node
> outages. As per https://issues.apache.org/jira/browse/CASSANDRA-6961, you
> would want join_ring=false when you have to repair a node before bringing a
> node back after some considerable outage. The problem I see with
> join_ring=false is that unlike autobootstrap, the node will NOT accept
> writes while you are running repair on it. If a node was down for 5 hours
> and you bring it back with join_ring=false, repair the node for 7 hours and
> then make it join the ring, it will STILL have missed writes because while
> the time repair was running (7 hrs), writes only went to other others.
> So, if you want to make sure that reads served by the restored node at CL
> ONE will return consistent data after the node has joined, you wont get
> that as writes have been missed while the node is being repaired. And if
> you work with Read/Write CL=QUORUM, even if you bring back the node without
> join_ring=false, you would anyways get the desired consistency. So, how
> join_ring would provide any additional consistency in this case ??
>
> I can see join_ring=false useful only when I am restoring from Snapshot or
> bootstrapping and there are dropped mutations in my cluster which are not
> fixed by hinted handoff.
>
> For Example: 3 nodes A,B,C working at Read/Write CL QUORUM. Hinted
> Handoff=3 hrs.
> 10 AM Snapshot taken on all 3 nodes
> 11 AM: Node B goes down for 4 hours
> 3 PM: Node B comes up but data is not repaired. So, 1 hr of dropped
> mutations (2-3 PM) not fixed via Hinted Handoff.
> 5 PM: Node A crashes.
> 6 PM: Node A restored from 10 AM Snapshot, Node A started with
> join_ring=false, repaired and then joined the cluster.
>
> In above restore snapshot example, updates from 2-3 PM were outside hinted
> handoff window of 3 hours. Thus, node B wont get those updates. Node A data
> for 2-3 PM is already lost. So, 2-3 PM updates are only on one replica i.e.
> node C and minimum consistency needed is QUORUM so join_ring=false would
> help. But this is very specific use case.
>
> Thanks
> Anuj
>
>

Reply via email to