[ 
https://issues.apache.org/jira/browse/SLING-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532182#comment-14532182
 ] 

Stefan Egli commented on SLING-4627:
------------------------------------

[~marett], thanks for the review comments!!

bq. If this understanding is correct, I'd have a few questions:
yes, that's as I understand it too.

bq. 1.
absolutely, it is problematic. The original idea was rule #6 - the 
{{minEventDelay}} - but is probably too simplistic. 
I have actually taken your input to rethink rule #6 and I think it must be 
integrated into OAK-2844 - I've added a comment over there that suggests to use 
oak's insight to delay sending the discovery-light's cluster changed event (I 
have yet to flesh out all the details, but an initial comment I added 
[there|https://issues.apache.org/jira/browse/OAK-2844?focusedCommentId=14531291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14531291]).

bq. 2.
This should be covered by the definition of the sync token. Whenever the 
'cluster view detection mechanism' declares there is a new cluster view (be it 
via voting or via atomic-updating-in-a-shared-resource) then that view must 
carry a unique id - that id can then be used as the sync token id. And yes, a 
particular instance could still be 'somewhat behind' and flag that it is 
changing to an older view (especially given eventual consistency delays as 
these sync tokens go through the repository) - *but* eventually it will also 
see the latest and greatest cluster view and it will also send a sync token for 
that.  And all the others are already waiting for that new sync token. It all 
is based on the fact that the discovery mechanism has a different delay 
compared to the repository - but with this coupling via sync token, this 
difference can be handled. (PS: the suggestion will be to use the 
{{ClusterView.getId()}} of OAK-2844 as the sync token)

{quote}I don't know the details of Oak very well, but maybe there is queue of 
data to be replicated somewhere. Getting a hand on this queue may offer such 
guarantee that data has been replicated up to the point in time X. Assuming 
such queue exists each instance could write a piece of data at the time X and 
wait until it sees it out of the queue (or written in the Journal). This would 
allow to keep each instance to care only about themselves.{quote}

That is a good point indeed! And as mentioned it made me rethink rule #6 and I 
now believe OAK-2844 should be used to process oak's inside knowledge - as 
indeed it has last known revision ids of all instances, so it just needs to 
[combine the 
dots|https://issues.apache.org/jira/browse/OAK-2844?focusedCommentId=14531291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14531291]

> TOPOLOGY_CHANGED in an eventually consistent repository
> -------------------------------------------------------
>
>                 Key: SLING-4627
>                 URL: https://issues.apache.org/jira/browse/SLING-4627
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Critical
>         Attachments: SLING-4627.patch, SLING-4627.patch
>
>
> This is a parent ticket describing the +coordination effort needed between 
> properly sending TOPOLOGY_CHANGED when running ontop of an eventually 
> consistent repository+. These findings are independent of the implementation 
> details used inside the discovery implementation, so apply to discovery.impl, 
> discovery.etcd/.zookeeper/.oak etc. Tickets to implement this for specific 
> implementation are best created separately (eg sub-task or related..). Also 
> note that this assumes immediately sending TOPOLOGY_CHANGING as described [in 
> SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
> h5. The spectrum of possible TOPOLOGY_CHANGED events include the following 
> scenarios:
> || scenario || classification || action ||
> | A. change is completely outside of local cluster | (/) uncritical | changes 
> outside the cluster are considered uncritical for this exercise. |
> | B. a new instance joins the local cluster, this new instance is by contract 
> not the leader (leader must be stable \[0\]) | (/) uncritical | a join of an 
> instance is uncritical due to the fact that it merely joins the cluster and 
> has thus no 'backlog' of changes that might be propagating through the 
> (eventually consistent) repository. |
> | C. a non-leader *leaves* the local cluster | (x) *critical* | changes that 
> were written by the leaving instance might still not be *seen* by all 
> surviving (ie it can be that discovery is faster than the repository) and 
> this must be assured before sending out TOPOLOGY_CHANGED. This is because the 
> leaving instance could have written changes that are *topology dependent* and 
> thus those changes must first be settled in the repository before continuing 
> with a *new topology*. |
> | D. the leader *leaves* the local cluster (and thus a new leader is elected) 
> | (x)(x) *very critical* | same as C except that this is more critical due to 
> the fact that the leader left |
> | E. -the leader of the local cluster changes (without leaving)- this is not 
> supported by contract (leader must be stable \[0\]) | (/) -irrelevant- | |
> So both C and D are about an instance leaving. And as mentioned above the 
> survivors must assure they have read all changes of the leavers. There are 
> two parts to this:
> * the leaver could have pending writes that are not yet in mongoD: I don't 
> think this is the case. The only thing that can remain could be an 
> uncommitted branch and that would be rolled back afaik.
> ** Exception to this is a partition: where the leaver didn't actually crash 
> but is still hooked to the repository. *For this I'm not sure how it can be 
> solved* yet.
> * the survivers could however not yet have read all changes (pending in the 
> background read) and one way to make sure they did is to have each surviving 
> instance write a (pseudo-) sync token to the repository. Once all survivors 
> have seen this sync token of all other survivors, the assumption is that all 
> pending changes are "flushed" through the eventually consistent repository 
> and that it is safe to send out a TOPOLOGY_CHANGED event. 
> * this sync token must be *conflict free* and could be eg: 
> {{/var/discovery/oak/clusterInstances/<slingId>/syncTokens/<newViewId>}} - 
> where {{newViewId}} is defined by whatever discovery mechanism is used
> * a special case is when only one instance is remaining. It can then not wait 
> for any other survivor to send a sync token. In that case sync tokens would 
> not work. All it could then possibly do is to wait for a certain time (which 
> should be larger than any expected background-read duration)
> [~mreutegg], [~chetanm] can you pls confirm/comment on the above "flush/sync 
> token" approach? Thx!
> /cc [~marett]
> \[0\] - see [getLeader() in 
> ClusterView|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to