[
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13696876#comment-13696876
]
Stefan Egli edited comment on SLING-2939 at 7/1/13 3:30 PM:
------------------------------------------------------------
[~ianeboston], [~rombert]: regarding JGroups: I think JGroups is quite a good
fit, except for two aspects:
* large installations typically would be udp rather than point-to-point (the
536 machine cluster for example used udp-multicast). I believe that we would
like to support Sling deployments across data-centers and use discovery between
those data-centers for certain admin operations. My concern here is how
feasible is udp across data-centers.
* I think the decision can be broken down to two deployment models: embedded
or dedicated servers. With embedding you have the advantage of no additional
services required but ideally use multicast (thus running into above concern).
With a dedicated service there is the downside of such an additional component,
but the scalability of the point-to-point setup, also cross data-center, seems
better. (Scalability not in terms of pure performance - there multicast is best
- but in terms of ease of configuration/setup).
was (Author: egli):
[~ianeboston], [~rombert]: regarding JGroups: I think JGroups is quite a
good fit, except for two aspects:
* large installations typically would be point-to-point rather than udp (the
536 machine cluster for example used udp-multicast). I believe that we would
like to support Sling deployments across data-centers and use discovery between
those data-centers for certain admin operations. My concern here is how
feasible is udp cross data-centers.
* I think the decision can be broken down to two deployment models: embedded
or dedicated servers. With embedding you have the advantage of no additional
services required but ideally use multicast (thus running into above concern).
With a dedicated service there is the downside of such an additional component,
but the scalability of the point-to-point setup, also cross data-center, seems
better. (Scalability not in terms of pure performance - there multicast is best
- but in terms of ease of configuration/setup).
> 3rd-party based implementation of discovery.api
> -----------------------------------------------
>
> Key: SLING-2939
> URL: https://issues.apache.org/jira/browse/SLING-2939
> Project: Sling
> Issue Type: Task
> Components: Extensions
> Affects Versions: Discovery API 1.0.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
>
> The Sling Discovery API introduces the abstraction of a topology which
> contains (Sling) clusters and instances, supports liveliness-detection,
> leader-election within a cluster and property-propagation between the
> instances. As a default and reference implementation a resource-based, OOTB
> implementation was created (org.apache.sling.discovery.impl).
> Pros and cons of the discovery.impl
> Although the discovery.impl supports everything required in discovery.api, it
> has a few limitations. Here's a list of pros and cons:
> Pros
> No additional software required (leverages repository for intra-cluster
> communication/storage and HTTP-REST calls for cross-cluster communication)
> Very small footprint
> Perfectly suited for a single clusters, instance and for small, rather
> stable hub-based topologies
> Cons
> Config-/deployment-limitations (aka embedded-limitation): connections
> between clusters are peer-to-peer and explicit. To span a topology, a number
> of instances must (be made) know (to) each other, changes in the topology
> typically requires config adjustments to guarantee high availability of the
> discovery service
> Except if a natural "hub cluster" exists that can serve as connection
> point for all "satellite clusters"
> Other than that, it is less suited for large and/or dynamic topologies
> Change propagation (for topology parts reported via connectors) is
> non-atomic and slow, hop-by-hop based
> No guarantee on order of TopologyEvents sent in individual instances - ie
> different instances might see different orders of TopologyEvents (ie changes
> in the topology) but eventually the topology is guaranteed to be consistent
> Robustness of discovery.impl wrt storm situations depends on robustness
> of underlying cluster (not a real negative but discovery.impl might in theory
> unveil repository bugs which would otherwise not have been a problem)
> Rather new, little tested code which might have issues with edge cases
> wrt network problems
> although partitioning-support is not a requirement per se, similar
> edge-cases might exist wrt network-delays/timing/crashes
> Reusing a suitable 3rd party library
> To provide an additional option as implementation of the discovery.api one
> idea is to use a suitable 3rd party library.
> Requirements
> The following is a list of requirements a 3rd party library must support:
> liveliness detection: detect whether an instance is up and running
> stable leader election within a cluster: stable describes the fact that a
> leader will remain leader until it leaves/crashes and no new, joining
> instance shall take over while a leader exists
> stable instance ordering: the list of instances within a cluster is
> ordered and stable, new, joining instances are put at the end of the list
> property propagation: propagate the properties provided within one
> instance to everybody in the topology. there are no timing requirements bound
> to this but the intention of this is not to be used as messaging but to
> announce config parameters to the topology
> support large, dynamic clusters: configuration of the new discovery
> implementation should be easy and support frequent changes in the (large)
> topology
> no single point of failure: this is obvious, there should of course be no
> single point of failure in the setup
> embedded or dedicated: this might be a hot topic: embedding a library has
> the advantages of not having to install anything additional. a dedicated
> service on the other hand requires additional handling in deployment.
> embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather
> than via a centralized service. this IMHO is a negative for large topologies
> which would typically be cross data-centers. hence a dedicated service could
> be seen as an advantage in the end.
> due to need for cross data-center deployments, the transport protocol
> must be TCP (or HTTP for that matter)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira