[ 
https://issues.apache.org/jira/browse/SLING-4603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14743529#comment-14743529
 ] 

Bertrand Delacretaz edited comment on SLING-4603 at 9/14/15 1:52 PM:
---------------------------------------------------------------------

I agree that if Oak provides similar coordination/consensus features than what 
Sling discovery needs, it makes sense to (optionally) use the Oak functionality 
instead of Sling's own implementation, when running on Oak.

My general view on this is that for larger systems it's good to be able to use 
industry standard coordination/consensus systems, so if this helps make it 
easier to replace the discovery implementation with a different one that's a 
plus.

For now the existing Sling implementation should remain available and tested, 
at least until there's a recommended and tested way to use and industry 
standard system for users who are not running Sling on Oak.


was (Author: bdelacretaz):
I agree that if Oak provides similar coordination/consensus features than what 
Sling discovery needs, it makes sense to (optionally) use the Oak functionality 
instead of Sling's own implementation, when running on Oak.

My general view on this is that for larger systems it's good to be able to use 
industry standard coordination/consensus systems, so if this helps make it 
easier to replace the discovery implementation with a different one that's a 
plus.

> discovery.oak: oak-based discovery implementation
> -------------------------------------------------
>
>                 Key: SLING-4603
>                 URL: https://issues.apache.org/jira/browse/SLING-4603
>             Project: Sling
>          Issue Type: New Feature
>          Components: Extensions
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>
> When discovery is used in a stack based on jackrabbit oak as the repository, 
> the current way of discoving instances somewhat sounds like duplicating work: 
> oak, or more precisely documentnodestore, itself has a low-level [lease 
> mechanism|http://jackrabbit.apache.org/oak/docs/nodestore/documentmk.html] 
> where it stores information about the cluster nodes including a {{leaseEnd}} 
> indicating at what time others can consider a particular node as 
> dead/crashed. This corresponds pretty much to the discovery.impl heartbeat 
> mechanism. And in a stack which is built ontop of oak-documentMk, we could be 
> making use of this fact and delegate the decision about whether a node in a 
> cluster is alive or not to the oak layer. Also, with OAK-2597 the relevant 
> information: {{ActiveClusterNodes}} is nicely exposed via JMX - so that can 
> become the new source of truth defining the cluster view.
> When replacing discovery-owned heartbeats with oak-owned ones, there is one 
> important detail to be watched out for: it can no longer easily be determined 
> from another instance in the cluster, whether it has this new discovery 
> bundle activated or not. Hence it is not given that when a voting happens, 
> that all {{active}} nodes (as reported by oak-documentMk) are actually going 
> to respond. So the 'silent instance due to deactivated discovery bundle' case 
> needs special attention/handling.
> Other than that, given the normal case of all {{active}} nodes having the 
> bundle activated, the voting mechanism can stay the same as in 
> discovery.impl. The topology connectors can be treated the same too (by 
> storing announcements to their respective 
> {{/var/discovery/clusterInstances/<slingId>/announcements/<announcerSlingId>}}
>  node. The properties can be handled the same too (by storing to 
> {{/properties}} node. Only thing that gets replaced is the {{heartbeats}}.
> Note that in order for such an oak-based discovery.impl this oak-lease 
> mechanism must be very robust (it should be so by its own interest already). 
> However, there are currently a few issues that should probably first be 
> resolved until discovery can be based on this: OAK-2739, OAK-2682 and 
> OAK-2681 are currently known in this area.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to