Stefan Egli created SLING-5030:
----------------------------------

             Summary: replace isolated mode with (larger) TOPOLOGY_CHANGING 
phase
                 Key: SLING-5030
                 URL: https://issues.apache.org/jira/browse/SLING-5030
             Project: Sling
          Issue Type: Bug
          Components: Extensions
    Affects Versions: Discovery Impl 1.0.2
            Reporter: Stefan Egli
            Assignee: Stefan Egli
             Fix For: Discovery Impl 1.1.8


As [described in 
SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494]
 one major reason why duplicate leaders happen in discovery.impl is the 
isolated mode: the rule of discovery API is that every instance is always in a 
cluster. That kind of makes sense. However, when the connection to the cluster 
(ie to the repository) is faulty or delayed for some reason - and the remaining 
cluster does no longer interpret the local instance as being alive (ie 
heartbeats have timed out), then currently the local instance notices this 
'isolated' state and wraps itself into a pseudo cluster consisting only of 
itself. Of which it by definition is the leader.

This is completely wrong: there should be no isolated mode. When this 'cut off' 
the cluster happens, the local instance should just immediately send out a 
TOPOLOGY_CHANGING and remain in this state until things have settled with the 
repository and it successfully has taken part of a voting. Only then can it 
send out a TOPOLOGY_CHANGED event.

This should fix a large number of situations where SLING-3432 has been seen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to