Re: oak-documentMk based discovery.impl / SLING-4603

Stefan Egli Thu, 16 Apr 2015 01:03:35 -0700

Hi Felix,

On 4/13/15 5:31 PM, "Stefan Egli" <[email protected]> wrote:


>Hi Felix,
>
>On 4/10/15 10:53 AM, "Felix Meschberger" <[email protected]> wrote:
>
>>  * It depends on a specific implementation detail of a specific
>>    Oak MK/NodeStore implementation. This implementation may
>>    change at any time
>>  * This feature seems to be accessed through JMX which exposes
>>    and admin interface which is not guaranteed to be stable
>>    for regular programmatic use
>
>Agreed. (except it's a trade-off with the advantage gained, as pointed out
>below)

After some more brainstorming, I think we should go back to the
suggestions that were floated around originally by MichaelM and again by
Chetan (the original suggestion is indeed brittle):

Let oak-mongoMk expose a mongo-connection (so as to make sure we're
reusing an existing connection and avoid any (authentication)
configuration on the discovery layer) and discovery.mongo would store
heartbeats directly, raw in a separate mongo collection - bypassing and
independent of any oak code (in this collection heartbeats and
establishedViews would be stored. Properties and announcement can remain
in JCR). Thus this would be completely separated - the only remaining link
is the exposed mongo connection.

This in my view would address the first two concerns.

>>  * It limits the topology to the Oak cluster members
>
>Not exactly. The idea was to use (embed) most of the functionality of
>discovery.impl - ie reuse the topology connectors et al. So cross-cluster
>would work exactly the same as with discovery.impl.
>
>>  * This looks like hacking around a problem in Oak leveraging
>>    other parts of Oak which seem to have issues in themselves ?
>
>Or, stated slightly differently: Oak's document-based clustering comes
>with an eventual consistency model. This by design incorporates a certain,
>undefined delay between when writes from one node become visible to
>others. In such a model it is unclear what, under any circumstances, the
>largest delay will be - and thus, what a proper heartbeat timeout should
>be configured to. So by making use of these ActiveClusterNodes, this
>'eventual consistency' (ie its delay) can be completely avoided and thus
>the algorithm becomes much more deterministic.

As stated, I believe mongoMk being eventually consistent is the problem
and a maximum delay cannot be guaranteed. Hence in my view it is more than
a hack. Even if current delays are enlarged due to an oak-bug, the
eventual consistency will always be there. So the risk of configuring a
heartbeat timeout that is too low will also always be there (Or in other
words: due to eventual consistency delays you can hardly detect node
crashes timely)


>PS: Discussed this offline today with Carsten/MichaelM: we should in any
>case finally implement a fix for long-standing SLING-3432 - this should be
>a big improvement to discovery.impl - and it would apply to any
>discovery.* implementation. I've added a comment to SLING-3432.

This one is in discussion, as to what to do with the isolated mode (which
is in my view not part of the api)

>PPS: I'll create another follow-up ticket for discovery which will be
>about 'proper synchronizing between sending of topology_changed event and
>the fact that the underlying repository is eventual consistent'. This
>currently is automatically handled in discovery.impl (as it is based on
>the repository and thus incorporates this "eventual-ness") - but any other
>discovery implementation (eg etcd/zookeeper/documentnodestore-based) that
>circumvents the repository must watch out for this.

This (SLING-4627) is also yet to be discussed if it should be in discovery
or in the actual users's code of discovery.

>>All in all, I doubt whether the energy we put into this really is worth
>>it given there are valid other solutions around which are sound, stable,
>>and proven such as etcd, zookeeper.
>>Maybe we should stick with the current discovery.impl as being good
>>enough and instead concentrate on building a new discovery implementation
>>based on said proven technology. For demo and ease-of-use purposes the
>>current discovery.impl is probably sufficient. For real world uses a etcd
>>or zookeeper or whathever based solution may be more promising IMHO.
>>
>>Sorry to sound deceptive, but I am not convinced of the approach.

What about the new approach?

Chers,
Stefan

Re: oak-documentMk based discovery.impl / SLING-4603

Reply via email to