[
https://issues.apache.org/jira/browse/SLING-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Egli reassigned SLING-4627:
----------------------------------
Assignee: Stefan Egli
> TOPOLOGY_CHANGED in an eventually consistent repository
> -------------------------------------------------------
>
> Key: SLING-4627
> URL: https://issues.apache.org/jira/browse/SLING-4627
> Project: Sling
> Issue Type: Improvement
> Components: Extensions
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Priority: Critical
> Attachments: SLING-4627.patch, SLING-4627.patch
>
>
> This is a parent ticket describing the +coordination effort needed between
> properly sending TOPOLOGY_CHANGED when running ontop of an eventually
> consistent repository+. These findings are independent of the implementation
> details used inside the discovery implementation, so apply to discovery.impl,
> discovery.etcd/.zookeeper/.oak etc. Tickets to implement this for specific
> implementation are best created separately (eg sub-task or related..). Also
> note that this assumes immediately sending TOPOLOGY_CHANGING as described [in
> SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
> h5. The spectrum of possible TOPOLOGY_CHANGED events include the following
> scenarios:
> || scenario || classification || action ||
> | A. change is completely outside of local cluster | (/) uncritical | changes
> outside the cluster are considered uncritical for this exercise. |
> | B. a new instance joins the local cluster, this new instance is by contract
> not the leader (leader must be stable \[0\]) | (/) uncritical | a join of an
> instance is uncritical due to the fact that it merely joins the cluster and
> has thus no 'backlog' of changes that might be propagating through the
> (eventually consistent) repository. |
> | C. a non-leader *leaves* the local cluster | (x) *critical* | changes that
> were written by the leaving instance might still not be *seen* by all
> surviving (ie it can be that discovery is faster than the repository) and
> this must be assured before sending out TOPOLOGY_CHANGED. This is because the
> leaving instance could have written changes that are *topology dependent* and
> thus those changes must first be settled in the repository before continuing
> with a *new topology*. |
> | D. the leader *leaves* the local cluster (and thus a new leader is elected)
> | (x)(x) *very critical* | same as C except that this is more critical due to
> the fact that the leader left |
> | E. -the leader of the local cluster changes (without leaving)- this is not
> supported by contract (leader must be stable \[0\]) | (/) -irrelevant- | |
> So both C and D are about an instance leaving. And as mentioned above the
> survivors must assure they have read all changes of the leavers. There are
> two parts to this:
> * the leaver could have pending writes that are not yet in mongoD: I don't
> think this is the case. The only thing that can remain could be an
> uncommitted branch and that would be rolled back afaik.
> ** Exception to this is a partition: where the leaver didn't actually crash
> but is still hooked to the repository. *For this I'm not sure how it can be
> solved* yet.
> * the survivers could however not yet have read all changes (pending in the
> background read) and one way to make sure they did is to have each surviving
> instance write a (pseudo-) sync token to the repository. Once all survivors
> have seen this sync token of all other survivors, the assumption is that all
> pending changes are "flushed" through the eventually consistent repository
> and that it is safe to send out a TOPOLOGY_CHANGED event.
> * this sync token must be *conflict free* and could be eg:
> {{/var/discovery/oak/clusterInstances/<slingId>/syncTokens/<newViewId>}} -
> where {{newViewId}} is defined by whatever discovery mechanism is used
> * a special case is when only one instance is remaining. It can then not wait
> for any other survivor to send a sync token. In that case sync tokens would
> not work. All it could then possibly do is to wait for a certain time (which
> should be larger than any expected background-read duration)
> [~mreutegg], [~chetanm] can you pls confirm/comment on the above "flush/sync
> token" approach? Thx!
> /cc [~marett]
> \[0\] - see [getLeader() in
> ClusterView|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)