Re: Clustering

Jules Gosnell Sat, 15 Oct 2005 10:33:49 -0700

Dave Colasurdo wrote:

Jules Gosnell wrote:
Jeff Genender wrote:
Now that we have achieved the covetted J2EE Certification, we needto start thinking about some of the things we will need to have inGeronimo in order to be mass adopted by the Enterprise.
IMHO, I think one of the huge holes is clustering. This is a heavyneed by many companies and I believe that until we get a powerfulclustering solution into G, it will not be taken as a serious J2EEcontender.
So, with that said, I wanted to start a discussion thread onclustering and what we need to do to get this into Geronimo. Ipersonally would like to be involved in this (thus the reason for mestarting this thread) - yeah, since Tomcat is done, now I am bored ;-).
I was going over the lists and emails and had some great discussionwith Jules on the WADI project he has built. This seems compellingto me. I also noticed Active Cluster as a possibility.
So lets start from the top. Do we use an already availableclusering engine or do we roll our own? Here is a small list ofchoices I have reviewed and it is by no means complete...
1) WADI
2) Active Cluster
3) Leverage the Tomcat Clustering engine

So here are some of my questions...
How complete is WADI and Active Cluster? Both look interesting tome. My only concern with Active Cluster is it seems to be JMS based,which I think may be slow for high performance clustering (am Iincorrect on this?). How mature is WADI?
Here is a status report on WADI.

I'm developing it full time.
A snapshot is available at wadi.codehaus.org - documentation is inthe wiki - at the moment the documentation (rather minimalist) ismore up to date than the snapshot, but I will try to get a fresh oneout next week.
WADI is a plugin HttpSession Manager replacement for Tomcat-5.0/5.5and Jetty-5.1/60 (it can actually migrate sessions between all fourin the same cluster).It comprises a vertical stack of pluggable caches/stores (memory,local disc, db etc) through which sessions are demoted as they ageand promoted as and when required to service a request.
Can you please clarify the purpose of promotion/demotion ofhttpsessions? Is this a mechanism to age old entries out of the cache?

I envisage a typically configured stack to look like this :memory<->localDisc<->cluster<->db.

The db is only used to load sessions if you are the first cluster memberto start, or to store them if you are the last cluster member to stop.The cluster store gives you access to the sessions held on every othernode in the cluster (more about this later)The localDisc is where sessions are paged out to by a pluggable evictionstrategy running in the memory store (currently based on inactivity, butcould take into account number of sessions in mem).Memory is where sessions and requests are combined in the rendering ofpages.

A request enters the top of the stack and travels downwards towards thecluster store, until its session is found (or not), at which point thesession is promoted into memory and the request rendered. The sessionwill stay in memory, until evicted downwards, explicitly invalidated, or(if the eviction strategy is e.g. NeverEvict) implicitly invalidated dueto time out.

How does this relate to httpsession inactivity timeouts?

Orthogonally. i thought about pushing timed out sessions into anotherstore so that they could be data-mined, but figured that once a sessionhad been destroyed (all sorts of listeners might have fired) it would beasking for trouble to try to serialise it. If the application wanted tokeep an archived copy, it could do it via one of these listseners. Theevicters are just there to spool stuff out onto e.g. disc, so that youcan hold larger sessions for more clients.

Is the cache size configurable?

It would be up to the evicter to use the number of local entries in itseviction algorithm. I don't have an evicter that does this currently,but I don't think it would be hard to write one.

This stack may be connected horizontally to a cluster by inserting aclustered store, which uses a distributed hash table (currentlyun-replicated, but I am working on it) to share state around theclusters members in a scalable manner. WADI has a working mod_jkintegration.
Does this mean that each cluster member shares it's httpsession datawith all of the other members (1-> all) or is there the notion oflimiting the httpsession replication to one (or a few) designatedpartners?

This is the most interesting and challenging part of WADI. I learnt frommy early experiences with httpsession distribution that 1->allreplication is simply a no-go. The point of having a large number ofnodes in a cluster is that your availablility becomesaverage-node-availability^number-of-nodes, not that your architectureforces you to partition your cluster into n/2 "micro-clusters", reducingyour availability to average-node-availability^2.


There are two distinct issues to deal with - location and replication.

I have solved the first issue (althought the code is not ready for primetime yet). The cluster has a fixed number of 'buckets'. Responsibilityfor these buckets is divided between the cluster members, and redividedon membership changes. Each bucket contains a map ofsession-id:location. A Session's id is used to map it to a bucket.Sessions are free to live on any node in the cluster. If a session iscreated/destroyed/migrated, its bucket owner is notified. Requests areexpected to fall 99/100 on the node holding the relevant session, so, inthis case, everything will hapen in-vm. Occasionally, due to nodemaintenance, load-balancer confusion etc, requests will fall elsewhere.In this case the receiving node can ask the bucket owner for thesession's location and either redirect/proxy the request to the session,or migrate the session in underneath the incoming request. Since onlyone node needs to be informed of a session's location, migrating asession does not need to involve notification to every cluster member ofthe new location.

I am working on replication at the moment - here is what I envisage -every session will be implemented via a master/primary and a number (n)slaves/secondaries. I expect n to usually be 1-2. Slaves will benotified of changes to the master either synchronously or asynchronouslyat some point (another pluggable strategy) after they occur. The masterand bucket owner will know the location of master and slaves. Death ofthe master will result in a slave being promoted to master and anotherslave being recruited. If a request should land on its sessions slave,rather than migrate the session from its master and then find you haveto recruit a new slave (to avoid having master and slave colocated),slave and master may just arrange with bucket owner to swap roles.

This actually just describes in-vm replication, which I hope will be asingle replication backend. Other backends may include e.g. backup to adb etc.

WADI currently sits on top of ActiveCluster, which it uses formembership notification and ActiveMQ which is used for transport byboth layers. ActiveMQ has pluggable protocols, including a peer://protocol which allows peers to talk directly to one another (thisshould put to bed fears of a JMS based solution not scaling -remember, JMS is just an API). So you do not need to choose betweenWADI and ActiveCluster - they are complimentary. ActiveCluster canalso (I believe) use JGroups as a transport - I haven't tried it.
ActiveSpace is another technology in this area (distributed caching)and it looks as if WADI and ActiveSpace will become more closelyaligned. So this may also be considered a complimentary technology.
Both Tomcat and Jetty currently have existing clustering solutions. Ilooked closely at the Tomcat solutions before starting out on WADIand knew all about the Jetty solution, because I wrote it :-). WADIis my answer to what I see as shortcomings in all of the existingopen source approaches to this problem-space.
Can you provide a quick high level description of the advantages ofWADI over Tomcat and Jetty clustering solutions?

Jetty uses 1->all replication over jgroups, as I believe 1 Tomcatsession manager does. I think the other Tomcat session manager also does1->all replication, but over its own protocol. Perhaps Jeff can confirmthis. I think TC's 'PersistentManager' is also able to write changedsessions out to disc at the end of the request.

1->all, for the reasons given above will not scale. The more nodes youadd, the more notifications each will have to react to and the moresessions it will have to hold. you are simply deferring your problemsfor a little while. Your only way out is to partition cluster andsacrifice your availability. When WADI's in-vm replication strategy isfinished, I think that this will make it a clear winner for anyonewishing to cluster more than 2-3 nodes.

WADI, is also, to my knowledge, the only open source session manager toreally resolve concurrency and serialisation issues within thehttpsession properly. You cannot serialise a session safely until youare sure that no user request is running through it. You (probably)cannot migrate or replicate it without serialisation. WADI uses lockingpolicies to ensure that container threads performing housekeeping andserialisation cannot collide with application/request threads modifyingthe same object. Jeff, are you aware of anything in TC which does thesame thing ? I think that they may keep some count of the number ofrequest threads active on a session, but last time I looked, I could notfind code that looked like it was checking this before attemptingserialisation or invalidation

Some parts of WADI should soon (December) be undergoing some serioustesting. When they pass we will be able to consider them productionready. Others, notably the distributed hash table are still underdevelopment (although a fairly functional version is available in theSNAPSHOT).
I think that, in the same way Tomcat clustering could be enabledeasily in Geronimo, WADI could also be added by virtue of itsintegration with Tomcat/Jetty, but I have been concentrating on mydistributed hash table too hard. If anyone is interested in talkingfurther about WADI, perhaps trying to plug it into Geronimo (It isspring-wired and uses spring to register its components with JMX. Iguess it should be simple to hook it into the Geronimo kernel in thesame way, I just haven't had the time), or helping out in any way atall, I would be delighted to hear from them.
I have broached the subject of a common session clustering frameworkwith members of the OpenEJB team and we have discussed things such asthe colocation of HttpSessions and SFSBs. I believe OpenEJB has beenmoving towards JCache to facilitate the plugging in of a clusteringsubstrate. My distributed hash table is also moving in the samedirection.
So, if I understand correctly, you are working towards some commoninfrastructure with openejb.. though WADI itself, will not addressclustering beyond the Web Tier?

We've had preliminary discussions. I guess, depending on how much WADIinfrastructure was of interest to OpenEJB, that I would look atgenericising core pieces so that they could deal with SFSBs as well asHttpSessions. In fact, most of the code already deals with a moregeneric abstraction which corresponds roughly to a JCache CacheEntry, sothis should not be hard. Many of the issues faced in the SFSB clusteringworld are mirrored in the HttpSession world, except that whilst anintelligent client-side proxy can solve a lot of location issues foryour SFSB, HttpSessions have to rely on slightly less intelligent e.g.h/w load-balancers...

There are also interesting issues arising from the integration ofclustered web and ejb tiers, such as the need to colocate httpsessionsand SFSBs. I have been discussing the possibility of having anApplicationSession object which can house a number of web (portlet speccomplicates this) and ejb sessions, so that if one migrates, they allend up on the new node together. If we don't have something like thisin place, your application components may end up scattered all over thecluster.


Thanks for the update!

Your welcome,


Jules

I hope that gives you all a little more information to go on. If youhave any questions, just fire away,
Jules
Thoughts and opinions are welcomed.

Jeff



--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Re: Clustering

Reply via email to