[jira] [Created] (SLING-4627) TOPOLOGY_CHANGED in an eventually consistent repository

Stefan Egli (JIRA) Wed, 15 Apr 2015 07:40:42 -0700

Stefan Egli created SLING-4627:
----------------------------------

             Summary: TOPOLOGY_CHANGED in an eventually consistent repository
                 Key: SLING-4627
                 URL: https://issues.apache.org/jira/browse/SLING-4627
             Project: Sling
          Issue Type: Improvement
          Components: Extensions
            Reporter: Stefan Egli
            Priority: Critical



This is a parent ticket describing the +coordination effort needed between 
properly sending TOPOLOGY_CHANGED when running ontop of an eventually 
consistent repository+. These findings are independent of the implementation 
details used inside the discovery implementation, so apply to discovery.impl, 
discovery.etcd/.zookeeper/.oak etc. Tickets to implement this for specific 
implementation are best created separately (eg sub-task or related..). Also 
note that this assumes immediately sending TOPOLOGY_CHANGING as described [in 
GRANITE-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]

h5. The spectrum of possible TOPOLOGY_CHANGED events include the following 
scenarios:
|| scenario || classification || action ||
| A. change is completely outside of local cluster | (/) uncritical | changes 
outside the cluster are considered uncritical for this exercise. |
| B. a new instance joins the local cluster, this new instance is by contract 
not the leader (leader must be stable \[0\]) | (/) uncritical | a join of an 
instance is uncritical due to the fact that it merely joins the cluster and has 
thus no 'backlog' of changes that might be propagating through the (eventually 
consistent) repository. |
| C. a non-leader *leaves* the local cluster | (x) *critical* | changes that 
were written by the leaving instance might still not be *seen* by all surviving 
(ie it can be that discovery is faster than the repository) and this must be 
assured before sending out TOPOLOGY_CHANGED. This is because the leaving 
instance could have written changes that are *topology dependent* and thus 
those changes must first be settled in the repository before continuing with a 
*new topology*. |
| D. the leader *leaves* the local cluster (and thus a new leader is elected) | 
(x)(x) *very critical* | same as C except that this is more critical due to the 
fact that the leader left |
| E. -the leader of the local cluster changes (without leaving)- this is not 
supported by contract (leader must be stable \[0\]) | (/) -irrelevant- | |

So both C and D are about an instance leaving. And as mentioned above the 
survivors must assure they have read all changes of the leavers. There are two 
parts to this:
* the leaver could have pending writes that are not yet in mongoD: I don't 
think this is the case. The only thing that can remain could be an uncommitted 
branch and that would be rolled back afaik.
** Exception to this is a partition: where the leaver didn't actually crash but 
is still hooked to the repository. *For this I'm not sure how it can be solved* 
yet.
* the survivers could however not yet have read all changes (pending in the 
background read) and one way to make sure they did is to have each surviving 
instance write a (pseudo-) sync token to the repository. Once all survivors 
have seen this sync token of all other survivors, the assumption is that all 
pending changes are "flushed" through the eventually consistent repository and 
that it is safe to send out a TOPOLOGY_CHANGED event. 
* this sync token must be *conflict free* and could be eg: 
{{/var/discovery/oak/clusterInstances/<slingId>/syncTokens/<newViewId>}} - 
where {{newViewId}} is defined by whatever discovery mechanism is used
* a special case is when only one instance is remaining. It can then not wait 
for any other survivor to send a sync token. In that case sync tokens would not 
work. All it could then possibly do is to wait for a certain time (which should 
be larger than any expected background-read duration)

[~mreutegg], [~chetanm] can you pls confirm/comment on the above "flush/sync 
token" approach? Thx!

/cc [~marett]

\[0\] - see [getLeader() in 
ClusterView|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (SLING-4627) TOPOLOGY_CHANGED in an eventually consistent repository

Reply via email to