[ https://issues.apache.org/jira/browse/SLING-5030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14906157#comment-14906157 ]
Stefan Egli commented on SLING-5030: ------------------------------------ Done #2 in http://svn.apache.org/viewvc?rev=1705024&view=rev > replace isolated mode with (larger) TOPOLOGY_CHANGING phase > ----------------------------------------------------------- > > Key: SLING-5030 > URL: https://issues.apache.org/jira/browse/SLING-5030 > Project: Sling > Issue Type: Bug > Components: Extensions > Affects Versions: Discovery Impl 1.0.2 > Reporter: Stefan Egli > Assignee: Stefan Egli > Fix For: Discovery Impl 1.1.8 > > > As [described in > SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494] > one major reason why duplicate leaders happen in discovery.impl is the > isolated mode: the rule of discovery API is that every instance is always in > a cluster. That kind of makes sense. However, when the connection to the > cluster (ie to the repository) is faulty or delayed for some reason - and the > remaining cluster does no longer interpret the local instance as being alive > (ie heartbeats have timed out), then currently the local instance notices > this 'isolated' state and wraps itself into a pseudo cluster consisting only > of itself. Of which it by definition is the leader. > This is completely wrong: there should be no isolated mode. When this 'cut > off' the cluster happens, the local instance should just immediately send out > a TOPOLOGY_CHANGING and remain in this state until things have settled with > the repository and it successfully has taken part of a voting. Only then can > it send out a TOPOLOGY_CHANGED event. > This should fix a large number of situations where SLING-3432 has been seen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)