[
https://issues.apache.org/jira/browse/SLING-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14532182#comment-14532182
]
Stefan Egli commented on SLING-4627:
------------------------------------
[~marett], thanks for the review comments!!
bq. If this understanding is correct, I'd have a few questions:
yes, that's as I understand it too.
bq. 1.
absolutely, it is problematic. The original idea was rule #6 - the
{{minEventDelay}} - but is probably too simplistic.
I have actually taken your input to rethink rule #6 and I think it must be
integrated into OAK-2844 - I've added a comment over there that suggests to use
oak's insight to delay sending the discovery-light's cluster changed event (I
have yet to flesh out all the details, but an initial comment I added
[there|https://issues.apache.org/jira/browse/OAK-2844?focusedCommentId=14531291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14531291]).
bq. 2.
This should be covered by the definition of the sync token. Whenever the
'cluster view detection mechanism' declares there is a new cluster view (be it
via voting or via atomic-updating-in-a-shared-resource) then that view must
carry a unique id - that id can then be used as the sync token id. And yes, a
particular instance could still be 'somewhat behind' and flag that it is
changing to an older view (especially given eventual consistency delays as
these sync tokens go through the repository) - *but* eventually it will also
see the latest and greatest cluster view and it will also send a sync token for
that. And all the others are already waiting for that new sync token. It all
is based on the fact that the discovery mechanism has a different delay
compared to the repository - but with this coupling via sync token, this
difference can be handled. (PS: the suggestion will be to use the
{{ClusterView.getId()}} of OAK-2844 as the sync token)
{quote}I don't know the details of Oak very well, but maybe there is queue of
data to be replicated somewhere. Getting a hand on this queue may offer such
guarantee that data has been replicated up to the point in time X. Assuming
such queue exists each instance could write a piece of data at the time X and
wait until it sees it out of the queue (or written in the Journal). This would
allow to keep each instance to care only about themselves.{quote}
That is a good point indeed! And as mentioned it made me rethink rule #6 and I
now believe OAK-2844 should be used to process oak's inside knowledge - as
indeed it has last known revision ids of all instances, so it just needs to
[combine the
dots|https://issues.apache.org/jira/browse/OAK-2844?focusedCommentId=14531291&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14531291]
> TOPOLOGY_CHANGED in an eventually consistent repository
> -------------------------------------------------------
>
> Key: SLING-4627
> URL: https://issues.apache.org/jira/browse/SLING-4627
> Project: Sling
> Issue Type: Improvement
> Components: Extensions
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Critical
> Attachments: SLING-4627.patch, SLING-4627.patch
>
>
> This is a parent ticket describing the +coordination effort needed between
> properly sending TOPOLOGY_CHANGED when running ontop of an eventually
> consistent repository+. These findings are independent of the implementation
> details used inside the discovery implementation, so apply to discovery.impl,
> discovery.etcd/.zookeeper/.oak etc. Tickets to implement this for specific
> implementation are best created separately (eg sub-task or related..). Also
> note that this assumes immediately sending TOPOLOGY_CHANGING as described [in
> SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
> h5. The spectrum of possible TOPOLOGY_CHANGED events include the following
> scenarios:
> || scenario || classification || action ||
> | A. change is completely outside of local cluster | (/) uncritical | changes
> outside the cluster are considered uncritical for this exercise. |
> | B. a new instance joins the local cluster, this new instance is by contract
> not the leader (leader must be stable \[0\]) | (/) uncritical | a join of an
> instance is uncritical due to the fact that it merely joins the cluster and
> has thus no 'backlog' of changes that might be propagating through the
> (eventually consistent) repository. |
> | C. a non-leader *leaves* the local cluster | (x) *critical* | changes that
> were written by the leaving instance might still not be *seen* by all
> surviving (ie it can be that discovery is faster than the repository) and
> this must be assured before sending out TOPOLOGY_CHANGED. This is because the
> leaving instance could have written changes that are *topology dependent* and
> thus those changes must first be settled in the repository before continuing
> with a *new topology*. |
> | D. the leader *leaves* the local cluster (and thus a new leader is elected)
> | (x)(x) *very critical* | same as C except that this is more critical due to
> the fact that the leader left |
> | E. -the leader of the local cluster changes (without leaving)- this is not
> supported by contract (leader must be stable \[0\]) | (/) -irrelevant- | |
> So both C and D are about an instance leaving. And as mentioned above the
> survivors must assure they have read all changes of the leavers. There are
> two parts to this:
> * the leaver could have pending writes that are not yet in mongoD: I don't
> think this is the case. The only thing that can remain could be an
> uncommitted branch and that would be rolled back afaik.
> ** Exception to this is a partition: where the leaver didn't actually crash
> but is still hooked to the repository. *For this I'm not sure how it can be
> solved* yet.
> * the survivers could however not yet have read all changes (pending in the
> background read) and one way to make sure they did is to have each surviving
> instance write a (pseudo-) sync token to the repository. Once all survivors
> have seen this sync token of all other survivors, the assumption is that all
> pending changes are "flushed" through the eventually consistent repository
> and that it is safe to send out a TOPOLOGY_CHANGED event.
> * this sync token must be *conflict free* and could be eg:
> {{/var/discovery/oak/clusterInstances/<slingId>/syncTokens/<newViewId>}} -
> where {{newViewId}} is defined by whatever discovery mechanism is used
> * a special case is when only one instance is remaining. It can then not wait
> for any other survivor to send a sync token. In that case sync tokens would
> not work. All it could then possibly do is to wait for a certain time (which
> should be larger than any expected background-read duration)
> [~mreutegg], [~chetanm] can you pls confirm/comment on the above "flush/sync
> token" approach? Thx!
> /cc [~marett]
> \[0\] - see [getLeader() in
> ClusterView|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)