Dave Colasurdo wrote:
Jules Gosnell wrote:
Jeff Genender wrote:
Now that we have achieved the covetted J2EE Certification, we need
to start thinking about some of the things we will need to have in
Geronimo in order to be mass adopted by the Enterprise.
IMHO, I think one of the huge holes is clustering. This is a heavy
need by many companies and I believe that until we get a powerful
clustering solution into G, it will not be taken as a serious J2EE
contender.
So, with that said, I wanted to start a discussion thread on
clustering and what we need to do to get this into Geronimo. I
personally would like to be involved in this (thus the reason for me
starting this thread) - yeah, since Tomcat is done, now I am bored ;-).
I was going over the lists and emails and had some great discussion
with Jules on the WADI project he has built. This seems compelling
to me. I also noticed Active Cluster as a possibility.
So lets start from the top. Do we use an already available
clusering engine or do we roll our own? Here is a small list of
choices I have reviewed and it is by no means complete...
1) WADI
2) Active Cluster
3) Leverage the Tomcat Clustering engine
So here are some of my questions...
How complete is WADI and Active Cluster? Both look interesting to
me. My only concern with Active Cluster is it seems to be JMS based,
which I think may be slow for high performance clustering (am I
incorrect on this?). How mature is WADI?
Here is a status report on WADI.
I'm developing it full time.
A snapshot is available at wadi.codehaus.org - documentation is in
the wiki - at the moment the documentation (rather minimalist) is
more up to date than the snapshot, but I will try to get a fresh one
out next week.
WADI is a plugin HttpSession Manager replacement for Tomcat-5.0/5.5
and Jetty-5.1/60 (it can actually migrate sessions between all four
in the same cluster).
It comprises a vertical stack of pluggable caches/stores (memory,
local disc, db etc) through which sessions are demoted as they age
and promoted as and when required to service a request.
Can you please clarify the purpose of promotion/demotion of
httpsessions? Is this a mechanism to age old entries out of the cache?
I envisage a typically configured stack to look like this :
memory<->localDisc<->cluster<->db.
The db is only used to load sessions if you are the first cluster member
to start, or to store them if you are the last cluster member to stop.
The cluster store gives you access to the sessions held on every other
node in the cluster (more about this later)
The localDisc is where sessions are paged out to by a pluggable eviction
strategy running in the memory store (currently based on inactivity, but
could take into account number of sessions in mem).
Memory is where sessions and requests are combined in the rendering of
pages.
A request enters the top of the stack and travels downwards towards the
cluster store, until its session is found (or not), at which point the
session is promoted into memory and the request rendered. The session
will stay in memory, until evicted downwards, explicitly invalidated, or
(if the eviction strategy is e.g. NeverEvict) implicitly invalidated due
to time out.
How does this relate to httpsession inactivity timeouts?
Orthogonally. i thought about pushing timed out sessions into another
store so that they could be data-mined, but figured that once a session
had been destroyed (all sorts of listeners might have fired) it would be
asking for trouble to try to serialise it. If the application wanted to
keep an archived copy, it could do it via one of these listseners. The
evicters are just there to spool stuff out onto e.g. disc, so that you
can hold larger sessions for more clients.
Is the cache size configurable?
It would be up to the evicter to use the number of local entries in its
eviction algorithm. I don't have an evicter that does this currently,
but I don't think it would be hard to write one.
This stack may be connected horizontally to a cluster by inserting a
clustered store, which uses a distributed hash table (currently
un-replicated, but I am working on it) to share state around the
clusters members in a scalable manner. WADI has a working mod_jk
integration.
Does this mean that each cluster member shares it's httpsession data
with all of the other members (1-> all) or is there the notion of
limiting the httpsession replication to one (or a few) designated
partners?
This is the most interesting and challenging part of WADI. I learnt from
my early experiences with httpsession distribution that 1->all
replication is simply a no-go. The point of having a large number of
nodes in a cluster is that your availablility becomes
average-node-availability^number-of-nodes, not that your architecture
forces you to partition your cluster into n/2 "micro-clusters", reducing
your availability to average-node-availability^2.
There are two distinct issues to deal with - location and replication.
I have solved the first issue (althought the code is not ready for prime
time yet). The cluster has a fixed number of 'buckets'. Responsibility
for these buckets is divided between the cluster members, and redivided
on membership changes. Each bucket contains a map of
session-id:location. A Session's id is used to map it to a bucket.
Sessions are free to live on any node in the cluster. If a session is
created/destroyed/migrated, its bucket owner is notified. Requests are
expected to fall 99/100 on the node holding the relevant session, so, in
this case, everything will hapen in-vm. Occasionally, due to node
maintenance, load-balancer confusion etc, requests will fall elsewhere.
In this case the receiving node can ask the bucket owner for the
session's location and either redirect/proxy the request to the session,
or migrate the session in underneath the incoming request. Since only
one node needs to be informed of a session's location, migrating a
session does not need to involve notification to every cluster member of
the new location.
I am working on replication at the moment - here is what I envisage -
every session will be implemented via a master/primary and a number (n)
slaves/secondaries. I expect n to usually be 1-2. Slaves will be
notified of changes to the master either synchronously or asynchronously
at some point (another pluggable strategy) after they occur. The master
and bucket owner will know the location of master and slaves. Death of
the master will result in a slave being promoted to master and another
slave being recruited. If a request should land on its sessions slave,
rather than migrate the session from its master and then find you have
to recruit a new slave (to avoid having master and slave colocated),
slave and master may just arrange with bucket owner to swap roles.
This actually just describes in-vm replication, which I hope will be a
single replication backend. Other backends may include e.g. backup to a
db etc.
WADI currently sits on top of ActiveCluster, which it uses for
membership notification and ActiveMQ which is used for transport by
both layers. ActiveMQ has pluggable protocols, including a peer://
protocol which allows peers to talk directly to one another (this
should put to bed fears of a JMS based solution not scaling -
remember, JMS is just an API). So you do not need to choose between
WADI and ActiveCluster - they are complimentary. ActiveCluster can
also (I believe) use JGroups as a transport - I haven't tried it.
ActiveSpace is another technology in this area (distributed caching)
and it looks as if WADI and ActiveSpace will become more closely
aligned. So this may also be considered a complimentary technology.
Both Tomcat and Jetty currently have existing clustering solutions. I
looked closely at the Tomcat solutions before starting out on WADI
and knew all about the Jetty solution, because I wrote it :-). WADI
is my answer to what I see as shortcomings in all of the existing
open source approaches to this problem-space.
Can you provide a quick high level description of the advantages of
WADI over Tomcat and Jetty clustering solutions?
Jetty uses 1->all replication over jgroups, as I believe 1 Tomcat
session manager does. I think the other Tomcat session manager also does
1->all replication, but over its own protocol. Perhaps Jeff can confirm
this. I think TC's 'PersistentManager' is also able to write changed
sessions out to disc at the end of the request.
1->all, for the reasons given above will not scale. The more nodes you
add, the more notifications each will have to react to and the more
sessions it will have to hold. you are simply deferring your problems
for a little while. Your only way out is to partition cluster and
sacrifice your availability. When WADI's in-vm replication strategy is
finished, I think that this will make it a clear winner for anyone
wishing to cluster more than 2-3 nodes.
WADI, is also, to my knowledge, the only open source session manager to
really resolve concurrency and serialisation issues within the
httpsession properly. You cannot serialise a session safely until you
are sure that no user request is running through it. You (probably)
cannot migrate or replicate it without serialisation. WADI uses locking
policies to ensure that container threads performing housekeeping and
serialisation cannot collide with application/request threads modifying
the same object. Jeff, are you aware of anything in TC which does the
same thing ? I think that they may keep some count of the number of
request threads active on a session, but last time I looked, I could not
find code that looked like it was checking this before attempting
serialisation or invalidation
Some parts of WADI should soon (December) be undergoing some serious
testing. When they pass we will be able to consider them production
ready. Others, notably the distributed hash table are still under
development (although a fairly functional version is available in the
SNAPSHOT).
I think that, in the same way Tomcat clustering could be enabled
easily in Geronimo, WADI could also be added by virtue of its
integration with Tomcat/Jetty, but I have been concentrating on my
distributed hash table too hard. If anyone is interested in talking
further about WADI, perhaps trying to plug it into Geronimo (It is
spring-wired and uses spring to register its components with JMX. I
guess it should be simple to hook it into the Geronimo kernel in the
same way, I just haven't had the time), or helping out in any way at
all, I would be delighted to hear from them.
I have broached the subject of a common session clustering framework
with members of the OpenEJB team and we have discussed things such as
the colocation of HttpSessions and SFSBs. I believe OpenEJB has been
moving towards JCache to facilitate the plugging in of a clustering
substrate. My distributed hash table is also moving in the same
direction.
So, if I understand correctly, you are working towards some common
infrastructure with openejb.. though WADI itself, will not address
clustering beyond the Web Tier?
We've had preliminary discussions. I guess, depending on how much WADI
infrastructure was of interest to OpenEJB, that I would look at
genericising core pieces so that they could deal with SFSBs as well as
HttpSessions. In fact, most of the code already deals with a more
generic abstraction which corresponds roughly to a JCache CacheEntry, so
this should not be hard. Many of the issues faced in the SFSB clustering
world are mirrored in the HttpSession world, except that whilst an
intelligent client-side proxy can solve a lot of location issues for
your SFSB, HttpSessions have to rely on slightly less intelligent e.g.
h/w load-balancers...
There are also interesting issues arising from the integration of
clustered web and ejb tiers, such as the need to colocate httpsessions
and SFSBs. I have been discussing the possibility of having an
ApplicationSession object which can house a number of web (portlet spec
complicates this) and ejb sessions, so that if one migrates, they all
end up on the new node together. If we don't have something like this
in place, your application components may end up scattered all over the
cluster.
Thanks for the update!
Your welcome,
Jules
I hope that gives you all a little more information to go on. If you
have any questions, just fire away,
Jules
Thoughts and opinions are welcomed.
Jeff
--
"Open Source is a self-assembling organism. You dangle a piece of
string into a super-saturated solution and a whole operating-system
crystallises out around it."
/**********************************
* Jules Gosnell
* Partner
* Core Developers Network (Europe)
*
* www.coredevelopers.net
*
* Open Source Training & Support.
**********************************/