Re: Clustering - JGroups issues and others

Jules Gosnell Wed, 19 Oct 2005 06:23:00 -0700

[EMAIL PROTECTED] wrote:

Hello


My concern regarding the clustering is that the mechanism itself is more
general, then the session replication only. (Un-)fortunately HTTP(Web)
is not the only interface in the cluster. From our perspective all
clustered facilities should base on the same mechanism in the solution,
as otherwise the behaviour of the system is hardly predictable. So, if
we base some application/service distribution model based on assumption
that sub-partitioning is possible, we may end up with interesting (from
technical perspective) problems regarding multiple services instead of a
single one reporting controversal values.

I see two points here ;

1) That whatever solution that we come up with to deal withfragmentation, be applicable to as many different areas of Geronimoclustering as is possible.

I agree wholeheartedly, Whilst the ideas I threw out were for WADI -and, by extension, any form of session management (i.e. OpenEJB etc),they were not intended to imply that this was the only problem. It isjust that problems involving clustered state are often the mostdifficult to deal with in terms of scalability and availability. We needto divide up the areas of functionality that we are compiling a list ofand decide how each one should respond to cluster fragmentation and whatapproaches can be shared.

2) You have illustrated the fragmentation issue with a particularusecase - singleton services. - That is my reading of your example - Ihope I haven't misunderstood.

I'm not sure that I actually see the problem with singleton services inthis case, but I guess it depends on how they are elected. I wouldexpect all fragments that find themselves running without a requiredservice would elect one node to perform it. As each fragment merged andrealised that it had two instances of the same singleton service, one ofthose instances would be de-elected. By the time the whole cluster hadreformed, only one instance of the service would remain.

Having said all of this, I would be much more in favour of anarchitecture which did not use singleton services at all. They representa single point of failure and contention. If there is a way to partitionthe service, or run a number of instances, I think that this would bepreferable. Ideally, I would like to see it partitioned to the pointthat every node carried a piece of the service and could beself-sufficient if it suddenly became isolated from the others. Thearchitecture behind WADI's distributed hash table works like this. Anode should only allocate session ids which map to buckets/partitions ofwhich it is the owner, thus a session may be born, live and die on asingle node (but be available to all) without that node having to talkto any other node (except for replication traffic - but the session neednot be replicated in order to be distributable/migratable to other nodes).

In our case the fragmentation of the cluster would lead to the fact,
that all fragments will try to reboot all other fragments using
management interfaces :) A true nightmare...

Sounds very nasty :-)

Besides, it is much easier to maintain/predict the cluster behaviour
when the node considered active only when it can reliably reach certain
(central) cluster network service. This is probably different from
traditional approach, but from our perspective it is better to loose all
the service, then to get something unpredictable. The reason is that in
both cases it is reported as a system outage, but in the second one it
is much more difficult to detect/analyse/fix.

Agreed - and perhaps this could be one form of pluggable membershiptracking strategy, that sat in the clustering substrate. This would meanthat in the case of fragmentation, only those nodes remaining in thesame fragment as the 'master' node would continue normally. All theothers, on losing contact with this node, would decide that they hadfallen out of the cluster and seek to reestablish a connection -hopefully refusing to service any requests (and therefore maintainingthe consistancy of the clustered service) until they had rejoined thesurviving fragment. As you have mentioned, you would have to makeabsolutely sure of the availability of this 'master' node, otherwise youwould lose your whole cluster.

With a model like this, we could describe your architecture, the jgroupsarchitecture and a number of other possibilities, whilst the issue ofmembership remains abstracted away from the clustered services themselves...


How does that sound ?


Jules

-valeri
-----Original Message-----
From: ext Jules Gosnell [mailto:[EMAIL PROTECTED]Sent: 19 October, 2005 13:51
To: [email protected]; [EMAIL PROTECTED]
Subject: Re: Clustering - JGroups issues and others

Thanks for coming back, Valeri.
You have put your finger fairly and squarely on the clusterimplementer's nightmare :-)
This really is a thorny problem which I keep coming back to.I'm assuming that if the cluster becomes fragmented intodifferent subgroups (that map to h/w enclosures etc.) and thatif they can all still see common backend servies, but notother peer groups, then the e.g. h/w load-balancer in aweb-deployment may still be able to see all nodes in allgroups ? Since traffic is still arriving at more than onecluster fragment, all sorts of problems may arise.
I guess WADI might do something like this :

The cluster fragments...
Each fragment would find that it had an incomplete set ofbuckets/partitions (WADI's architecture is to partition thesession space into a fixed number of buckets and shareresponsibility for these between the cluster members).
Each fragment would have to assume that the missing partitionshad been lost and would not be rejoining (in case this werereally the case), so the missing partitions would have to beresurrected and repopulated with sessions drawn fromreplicated copies. Thus each fragment would end up with acomplete set of partitions.
Each fragment would be likely to end up with an incompletesession set that intersected with the session set held byother fragments (since it is likely that not all sessionscould be resurrected, and some would be resurrected withinmore than one fragment).
Assuming (and I think we would have to make this a hardrequirement) that the load-balancer supported session affinitycorrectly, requests would continue to be directed to the nodeholding the original (not
resurrected) version of their session.
So, at this point, we have survived the fragmentation and weare still fully available to our clients, although there mayhave been quite a lag whilst partitions wererebuilt/repopulated and the footprint of each node hasprobably increased due to each fragment carrying a largerproportion of the original cluster's sessions than it wasoriginally (the session sets intersect).
Then, the network comes back :-)
Each fragment would become aware of the other fragments.Multiple copies of partitions and sessions would now existwithin the same cluster.
Multiple instances of the same partition can be merged bysimply taking the union of the session sets that they manage.
Merging multiple instances of the same session is a bit moreawkward. if sessions carried some sort of version(HttpSessions carry a LastAccessedTime field), then allinstances with the same 'version' can be collapsed. I guess wethen move on to a pluggable strategy of some sort. Thesimplest of these would probably just assume that only onesession would have been involved in a dialogue with the clientsince the fracture, since the client was 'stuck' to its node.If this is the case, then sessions with the lower version willall be snaphots of the original session taken at the point offracture and will not have diverged further and so may besafely discarded (we may be able to try to remember/deduce thetime of fracture and discard any session with a LAT beforethat point), leaving the original session only to continue.If divergance has occurred, then some custom, applicationspace code might be run that can use application-levelknowledge to merge the various session versions. But I thinkthat if we have got to this stage, then we are in real troubleand should probably just declare an error and drop the session.
None of this is yet implemented in WADI, but it is stuff thatI dream/have-nightmares about when I get too geeky :-) I hopeto put some of this fn-ality in at some point.
What sort of frequency might this type of scenrio occur with ?It will be a lot of work to protect against it, but I realisethat a truly enterprise-level solution must be able to survivethis sort of thing.
If anyone else has had thoughts about surviving clusterfragmentation, I would be delighted to hear them.
Jules



--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Re: Clustering - JGroups issues and others

Reply via email to