[
https://issues.apache.org/jira/browse/SLING-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13699411#comment-13699411
]
Robert Munteanu commented on SLING-2939:
----------------------------------------
[~egli] - JGroups can function well with UDP or TCP. I actually switched to TCP
at some point in my cluster for more reliability, but I guess things have
changed in the last 4 years.
As for dedicated servers vs embedded, since JGroups is easily embeddable I
solved this problem by running a mini-app which just embedded JGroups and
(IIRC) was designated to be the JGroups coordinator. This application being
always available and under low load, it was a perfect fit for a coordinator.
The details are a bit unclear for me right now, but I can dig them up if needed.
> 3rd-party based implementation of discovery.api
> -----------------------------------------------
>
> Key: SLING-2939
> URL: https://issues.apache.org/jira/browse/SLING-2939
> Project: Sling
> Issue Type: Task
> Components: Extensions
> Affects Versions: Discovery API 1.0.0
> Reporter: Stefan Egli
> Assignee: Stefan Egli
>
> The Sling Discovery API introduces the abstraction of a topology which
> contains (Sling) clusters and instances, supports liveliness-detection,
> leader-election within a cluster and property-propagation between the
> instances. As a default and reference implementation a resource-based, OOTB
> implementation was created (org.apache.sling.discovery.impl).
> Pros and cons of the discovery.impl
> Although the discovery.impl supports everything required in discovery.api, it
> has a few limitations. Here's a list of pros and cons:
> Pros
> No additional software required (leverages repository for intra-cluster
> communication/storage and HTTP-REST calls for cross-cluster communication)
> Very small footprint
> Perfectly suited for a single clusters, instance and for small, rather
> stable hub-based topologies
> Cons
> Config-/deployment-limitations (aka embedded-limitation): connections
> between clusters are peer-to-peer and explicit. To span a topology, a number
> of instances must (be made) know (to) each other, changes in the topology
> typically requires config adjustments to guarantee high availability of the
> discovery service
> Except if a natural "hub cluster" exists that can serve as connection
> point for all "satellite clusters"
> Other than that, it is less suited for large and/or dynamic topologies
> Change propagation (for topology parts reported via connectors) is
> non-atomic and slow, hop-by-hop based
> No guarantee on order of TopologyEvents sent in individual instances - ie
> different instances might see different orders of TopologyEvents (ie changes
> in the topology) but eventually the topology is guaranteed to be consistent
> Robustness of discovery.impl wrt storm situations depends on robustness
> of underlying cluster (not a real negative but discovery.impl might in theory
> unveil repository bugs which would otherwise not have been a problem)
> Rather new, little tested code which might have issues with edge cases
> wrt network problems
> although partitioning-support is not a requirement per se, similar
> edge-cases might exist wrt network-delays/timing/crashes
> Reusing a suitable 3rd party library
> To provide an additional option as implementation of the discovery.api one
> idea is to use a suitable 3rd party library.
> Requirements
> The following is a list of requirements a 3rd party library must support:
> liveliness detection: detect whether an instance is up and running
> stable leader election within a cluster: stable describes the fact that a
> leader will remain leader until it leaves/crashes and no new, joining
> instance shall take over while a leader exists
> stable instance ordering: the list of instances within a cluster is
> ordered and stable, new, joining instances are put at the end of the list
> property propagation: propagate the properties provided within one
> instance to everybody in the topology. there are no timing requirements bound
> to this but the intention of this is not to be used as messaging but to
> announce config parameters to the topology
> support large, dynamic clusters: configuration of the new discovery
> implementation should be easy and support frequent changes in the (large)
> topology
> no single point of failure: this is obvious, there should of course be no
> single point of failure in the setup
> embedded or dedicated: this might be a hot topic: embedding a library has
> the advantages of not having to install anything additional. a dedicated
> service on the other hand requires additional handling in deployment.
> embedding implies a peer-to-peer setup: nodes communicate peer-to-peer rather
> than via a centralized service. this IMHO is a negative for large topologies
> which would typically be cross data-centers. hence a dedicated service could
> be seen as an advantage in the end.
> due to need for cross data-center deployments, the transport protocol
> must be TCP (or HTTP for that matter)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira