[ 
https://issues.apache.org/jira/browse/SLING-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Egli updated SLING-10489:
--------------------------------
    Description: 
Discovery.oak requires that both Oak and Sling are operating normally in order 
to declare victory and announce a new topology.

The startup phase is especially tricky in this regard, since there are multiple 
elements that need to get updated (some are in the Oak layer, some in Sling) :
 * lease & clusterNodeId : this is maintained by Oak
 * idMap : this is maintained by IdMapService (Sling)
 * leaderElectionId : this is maintained by OakViewChecker (Sling)
 * syncToken : this is maintained by SyncTokenService (Sling)

Situations have been seen where Oak is startup up fine, but higher level (eg 
Sling) bundles were not activated within a reasonable amount of time. This lead 
to discovery staying in TOPOLOGY_CHANGING state for longer than expected.

There should be a mechanism that ignores (suppresses) newly joining instances 
if they start up only partially. However, after a certain timeout this 
mechanism should give up.

  was:
Discovery.oak requires that both Oak and Sling are operating normally in order 
to declare victory and announce a new topology.

The startup phase is especially tricky in this regard, since there are multiple 
elements that need to get updated  (some are in the Oak layer, some in Sling) :

* lease & clusterNodeId : this is maintained by Oak
* idMap : this is maintained by IdMapService (Sling)
* leaderElectionId : this is maintained by OakViewChecker (Sling)
* syncToken : this is maintained by SyncTokenService (Sling)

Situations have seen where Oak is startup up fine, but higher level (eg Sling) 
bundles were not activated within a reasonable amount of time. This lead to 
discovery staying in TOPOLOGY_CHANGING state for longer than expected.

There should be a mechanism that ignores (suppresses) newly joining instances 
if they start up only partially. However, after a certain timeout this 
mechanism should give up.


> Ignore partially started, newly joining instances to avoid disturbing 
> discovery (for a while)
> ---------------------------------------------------------------------------------------------
>
>                 Key: SLING-10489
>                 URL: https://issues.apache.org/jira/browse/SLING-10489
>             Project: Sling
>          Issue Type: Improvement
>          Components: Discovery
>    Affects Versions: Discovery Oak 1.2.34
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>             Fix For: Discovery Oak 1.2.36
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> Discovery.oak requires that both Oak and Sling are operating normally in 
> order to declare victory and announce a new topology.
> The startup phase is especially tricky in this regard, since there are 
> multiple elements that need to get updated (some are in the Oak layer, some 
> in Sling) :
>  * lease & clusterNodeId : this is maintained by Oak
>  * idMap : this is maintained by IdMapService (Sling)
>  * leaderElectionId : this is maintained by OakViewChecker (Sling)
>  * syncToken : this is maintained by SyncTokenService (Sling)
> Situations have been seen where Oak is startup up fine, but higher level (eg 
> Sling) bundles were not activated within a reasonable amount of time. This 
> lead to discovery staying in TOPOLOGY_CHANGING state for longer than expected.
> There should be a mechanism that ignores (suppresses) newly joining instances 
> if they start up only partially. However, after a certain timeout this 
> mechanism should give up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to