Dave Colasurdo wrote:



Jules Gosnell wrote:

Jeff Genender wrote:

Now that we have achieved the covetted J2EE Certification, we need to start thinking about some of the things we will need to have in Geronimo in order to be mass adopted by the Enterprise.

IMHO, I think one of the huge holes is clustering. This is a heavy need by many companies and I believe that until we get a powerful clustering solution into G, it will not be taken as a serious J2EE contender.

So, with that said, I wanted to start a discussion thread on clustering and what we need to do to get this into Geronimo. I personally would like to be involved in this (thus the reason for me starting this thread) - yeah, since Tomcat is done, now I am bored ;-).

I was going over the lists and emails and had some great discussion with Jules on the WADI project he has built. This seems compelling to me. I also noticed Active Cluster as a possibility.

So lets start from the top. Do we use an already available clusering engine or do we roll our own? Here is a small list of choices I have reviewed and it is by no means complete...

1) WADI
2) Active Cluster
3) Leverage the Tomcat Clustering engine

So here are some of my questions...

How complete is WADI and Active Cluster? Both look interesting to me. My only concern with Active Cluster is it seems to be JMS based, which I think may be slow for high performance clustering (am I incorrect on this?). How mature is WADI?



Here is a status report on WADI.

I'm developing it full time.

A snapshot is available at wadi.codehaus.org - documentation is in the wiki - at the moment the documentation (rather minimalist) is more up to date than the snapshot, but I will try to get a fresh one out next week.

WADI is a plugin HttpSession Manager replacement for Tomcat-5.0/5.5 and Jetty-5.1/60 (it can actually migrate sessions between all four in the same cluster). It comprises a vertical stack of pluggable caches/stores (memory, local disc, db etc) through which sessions are demoted as they age and promoted as and when required to service a request.


Can you please clarify the purpose of promotion/demotion of httpsessions? Is this a mechanism to age old entries out of the cache?

I envisage a typically configured stack to look like this : memory<->localDisc<->cluster<->db.

The db is only used to load sessions if you are the first cluster member to start, or to store them if you are the last cluster member to stop. The cluster store gives you access to the sessions held on every other node in the cluster (more about this later) The localDisc is where sessions are paged out to by a pluggable eviction strategy running in the memory store (currently based on inactivity, but could take into account number of sessions in mem). Memory is where sessions and requests are combined in the rendering of pages.

A request enters the top of the stack and travels downwards towards the cluster store, until its session is found (or not), at which point the session is promoted into memory and the request rendered. The session will stay in memory, until evicted downwards, explicitly invalidated, or (if the eviction strategy is e.g. NeverEvict) implicitly invalidated due to time out.

How does this relate to httpsession inactivity timeouts?

Orthogonally. i thought about pushing timed out sessions into another store so that they could be data-mined, but figured that once a session had been destroyed (all sorts of listeners might have fired) it would be asking for trouble to try to serialise it. If the application wanted to keep an archived copy, it could do it via one of these listseners. The evicters are just there to spool stuff out onto e.g. disc, so that you can hold larger sessions for more clients.

Is the cache size configurable?

It would be up to the evicter to use the number of local entries in its eviction algorithm. I don't have an evicter that does this currently, but I don't think it would be hard to write one.

This stack may be connected horizontally to a cluster by inserting a clustered store, which uses a distributed hash table (currently un-replicated, but I am working on it) to share state around the clusters members in a scalable manner. WADI has a working mod_jk integration.


Does this mean that each cluster member shares it's httpsession data with all of the other members (1-> all) or is there the notion of limiting the httpsession replication to one (or a few) designated partners?

This is the most interesting and challenging part of WADI. I learnt from my early experiences with httpsession distribution that 1->all replication is simply a no-go. The point of having a large number of nodes in a cluster is that your availablility becomes average-node-availability^number-of-nodes, not that your architecture forces you to partition your cluster into n/2 "micro-clusters", reducing your availability to average-node-availability^2.

There are two distinct issues to deal with - location and replication.

I have solved the first issue (althought the code is not ready for prime time yet). The cluster has a fixed number of 'buckets'. Responsibility for these buckets is divided between the cluster members, and redivided on membership changes. Each bucket contains a map of session-id:location. A Session's id is used to map it to a bucket. Sessions are free to live on any node in the cluster. If a session is created/destroyed/migrated, its bucket owner is notified. Requests are expected to fall 99/100 on the node holding the relevant session, so, in this case, everything will hapen in-vm. Occasionally, due to node maintenance, load-balancer confusion etc, requests will fall elsewhere. In this case the receiving node can ask the bucket owner for the session's location and either redirect/proxy the request to the session, or migrate the session in underneath the incoming request. Since only one node needs to be informed of a session's location, migrating a session does not need to involve notification to every cluster member of the new location.

I am working on replication at the moment - here is what I envisage - every session will be implemented via a master/primary and a number (n) slaves/secondaries. I expect n to usually be 1-2. Slaves will be notified of changes to the master either synchronously or asynchronously at some point (another pluggable strategy) after they occur. The master and bucket owner will know the location of master and slaves. Death of the master will result in a slave being promoted to master and another slave being recruited. If a request should land on its sessions slave, rather than migrate the session from its master and then find you have to recruit a new slave (to avoid having master and slave colocated), slave and master may just arrange with bucket owner to swap roles.

This actually just describes in-vm replication, which I hope will be a single replication backend. Other backends may include e.g. backup to a db etc.




WADI currently sits on top of ActiveCluster, which it uses for membership notification and ActiveMQ which is used for transport by both layers. ActiveMQ has pluggable protocols, including a peer:// protocol which allows peers to talk directly to one another (this should put to bed fears of a JMS based solution not scaling - remember, JMS is just an API). So you do not need to choose between WADI and ActiveCluster - they are complimentary. ActiveCluster can also (I believe) use JGroups as a transport - I haven't tried it.

ActiveSpace is another technology in this area (distributed caching) and it looks as if WADI and ActiveSpace will become more closely aligned. So this may also be considered a complimentary technology.

Both Tomcat and Jetty currently have existing clustering solutions. I looked closely at the Tomcat solutions before starting out on WADI and knew all about the Jetty solution, because I wrote it :-). WADI is my answer to what I see as shortcomings in all of the existing open source approaches to this problem-space.

Can you provide a quick high level description of the advantages of WADI over Tomcat and Jetty clustering solutions?

Jetty uses 1->all replication over jgroups, as I believe 1 Tomcat session manager does. I think the other Tomcat session manager also does 1->all replication, but over its own protocol. Perhaps Jeff can confirm this. I think TC's 'PersistentManager' is also able to write changed sessions out to disc at the end of the request.

1->all, for the reasons given above will not scale. The more nodes you add, the more notifications each will have to react to and the more sessions it will have to hold. you are simply deferring your problems for a little while. Your only way out is to partition cluster and sacrifice your availability. When WADI's in-vm replication strategy is finished, I think that this will make it a clear winner for anyone wishing to cluster more than 2-3 nodes.

WADI, is also, to my knowledge, the only open source session manager to really resolve concurrency and serialisation issues within the httpsession properly. You cannot serialise a session safely until you are sure that no user request is running through it. You (probably) cannot migrate or replicate it without serialisation. WADI uses locking policies to ensure that container threads performing housekeeping and serialisation cannot collide with application/request threads modifying the same object. Jeff, are you aware of anything in TC which does the same thing ? I think that they may keep some count of the number of request threads active on a session, but last time I looked, I could not find code that looked like it was checking this before attempting serialisation or invalidation



Some parts of WADI should soon (December) be undergoing some serious testing. When they pass we will be able to consider them production ready. Others, notably the distributed hash table are still under development (although a fairly functional version is available in the SNAPSHOT).

I think that, in the same way Tomcat clustering could be enabled easily in Geronimo, WADI could also be added by virtue of its integration with Tomcat/Jetty, but I have been concentrating on my distributed hash table too hard. If anyone is interested in talking further about WADI, perhaps trying to plug it into Geronimo (It is spring-wired and uses spring to register its components with JMX. I guess it should be simple to hook it into the Geronimo kernel in the same way, I just haven't had the time), or helping out in any way at all, I would be delighted to hear from them.

I have broached the subject of a common session clustering framework with members of the OpenEJB team and we have discussed things such as the colocation of HttpSessions and SFSBs. I believe OpenEJB has been moving towards JCache to facilitate the plugging in of a clustering substrate. My distributed hash table is also moving in the same direction.

So, if I understand correctly, you are working towards some common infrastructure with openejb.. though WADI itself, will not address clustering beyond the Web Tier?

We've had preliminary discussions. I guess, depending on how much WADI infrastructure was of interest to OpenEJB, that I would look at genericising core pieces so that they could deal with SFSBs as well as HttpSessions. In fact, most of the code already deals with a more generic abstraction which corresponds roughly to a JCache CacheEntry, so this should not be hard. Many of the issues faced in the SFSB clustering world are mirrored in the HttpSession world, except that whilst an intelligent client-side proxy can solve a lot of location issues for your SFSB, HttpSessions have to rely on slightly less intelligent e.g. h/w load-balancers...

There are also interesting issues arising from the integration of clustered web and ejb tiers, such as the need to colocate httpsessions and SFSBs. I have been discussing the possibility of having an ApplicationSession object which can house a number of web (portlet spec complicates this) and ejb sessions, so that if one migrates, they all end up on the new node together. If we don't have something like this in place, your application components may end up scattered all over the cluster.


Thanks for the update!

Your welcome,


Jules

I hope that gives you all a little more information to go on. If you have any questions, just fire away,


Jules



Thoughts and opinions are welcomed.

Jeff







--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."

/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
*    www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/

Reply via email to