[
https://issues.apache.org/jira/browse/SLING-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906370#comment-14906370
]
Stefan Egli commented on SLING-5030:
------------------------------------
and improved a log output at http://svn.apache.org/viewvc?rev=1705059&view=rev
> replace isolated mode with (larger) TOPOLOGY_CHANGING phase
> -----------------------------------------------------------
>
> Key: SLING-5030
> URL: https://issues.apache.org/jira/browse/SLING-5030
> Project: Sling
> Issue Type: Bug
> Components: Extensions
> Affects Versions: Discovery Impl 1.0.2
> Reporter: Stefan Egli
> Assignee: Stefan Egli
> Fix For: Discovery Impl 1.1.8
>
>
> As [described in
> SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
> one major reason why duplicate leaders happen in discovery.impl is the
> isolated mode: the rule of discovery API is that every instance is always in
> a cluster. That kind of makes sense. However, when the connection to the
> cluster (ie to the repository) is faulty or delayed for some reason - and the
> remaining cluster does no longer interpret the local instance as being alive
> (ie heartbeats have timed out), then currently the local instance notices
> this 'isolated' state and wraps itself into a pseudo cluster consisting only
> of itself. Of which it by definition is the leader.
> This is completely wrong: there should be no isolated mode. When this 'cut
> off' the cluster happens, the local instance should just immediately send out
> a TOPOLOGY_CHANGING and remain in this state until things have settled with
> the repository and it successfully has taken part of a voting. Only then can
> it send out a TOPOLOGY_CHANGED event.
> This should fix a large number of situations where SLING-3432 has been seen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)