Stefan Egli created SLING-4603:
----------------------------------

             Summary: oak-documentMk based discovery.impl
                 Key: SLING-4603
                 URL: https://issues.apache.org/jira/browse/SLING-4603
             Project: Sling
          Issue Type: New Feature
          Components: Extensions
            Reporter: Stefan Egli
            Assignee: Stefan Egli


When discovery is used in a stack based on jackrabbit oak as the repository, 
the current way of discoving instances somewhat sounds like duplicating work: 
oak, or more precisely documentnodestore, itself has a low-level [lease 
mechanism|http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html] 
where it stores information about the cluster nodes including a {{leaseEnd}} 
indicating at what time others can consider a particular node as dead/crashed. 
This corresponds pretty much to the discovery.impl heartbeat mechanism. And in 
a stack which is built ontop of oak-documentMk, we could be making use of this 
fact and delegate the decision about whether a node in a cluster is alive or 
not to the oak layer. Also, with OAK-2597 the relevant information: 
{{ActiveClusterNodes}} is nicely exposed via JMX - so that can become the new 
source of truth defining the cluster view.

When replacing discovery-owned heartbeats with oak-owned ones, there is one 
important detail to be watched out for: it can no longer easily be determined 
from another instance in the cluster, whether it has this new discovery bundle 
activated or not. Hence it is not given that when a voting happens, that all 
{{active}} nodes (as reported by oak-documentMk) are actually going to respond. 
So the 'silent instance due to deactivated discovery bundle' case needs special 
attention/handling.

Other than that, given the normal case of all {{active}} nodes having the 
bundle activated, the voting mechanism can stay the same as in discovery.impl. 
The topology connectors can be treated the same too (by storing announcements 
to their respective 
{{/var/discovery/clusterInstances/<slingId>/announcements/<announcerSlingId>}} 
node. The properties can be handled the same too (by storing to {{/properties}} 
node. Only thing that gets replaced is the {{heartbeats}}.

Note that in order for such an oak-based discovery.impl this oak-lease 
mechanism must be very robust (it should be so by its own interest already). 
However, there are currently a few issues that should probably first be 
resolved until discovery can be based on this: OAK-2739, OAK-2682 and OAK-2681 
are currently known in this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to