Hey
I´ve found a post on clustering on another java mailing list, the post is
originally from a vendor (Gemstone) but some of is points seem very valid
<post>
Extreme Clustering starts with the idea that we run many VM's on the same
machine. Then we
provide a solution for a problem that every distributed system has (and some
that are not so
distributed). How do I find use server side resource. Bigger problem, how
do I dispose of them
when they are shared! This is a problem for dcom, corba, rmi and evey other
system based on
distributed object. Actually, if you want to know the basis of your memory
leaks in 95, 98, and
NT, it's bacause of sharing dll's. Once loaded, the 95, 98 shells and NT OS
don't know when
to unload them (this problem is isomorphic to distributed objects). So, how
does this work.
We provide a process known as the activator. The activator knows which
EJB's are deployed
in the system. It keeps information on how to create VM pools. It keeps an
activation record
which points EJB instanciations to VMs within a particular pool. The pools
can grow and shrink
to meet the demands placed on the server. This pooling behavior is build
into the activator.
So, when I deploy a system, I use a template to describe the configuration
of a VM in a pool
and then the properties of the pool. I then tell the activator which EJB's
will use what VM pool.
Home lookups are handled by the activator so he can load balance calls to
the various VMs in the
pools. We also manage JDBC pools
Now, if you look at WebLogic, they run a single VM in a single IP space. WL
claims that this
is good enough but, I've got four knocks against this claim
1) Maturity of Java (Yes, it's still a new technology and as a result there
are some problem).
2) Application Avaliability (take this from a users point of view).
3) Threading and thread scheduling in the VM (even with fully threaded OS's,
there are VM limits).
4) cache coherancy (in multi VM architectures, how do you keep everybody in
sync).
1) VM's need to manage two memory spaces, Java and C. Although VM's are
getting better, they
are sill not perfect. Even so, a VM cannot control how a JNI call
manages C heap space. Thus
VM's will bloat and run out of memory over time. Before they run out
of memory, they're
performance starts degrading.
2) Threading and thread scheduling. This is something that you can check
for your self. Even though
it will take you 1500-1800 threads to run a VM our of memory, you'll
find that if you're doing any
amount of work in a thread, you start killing performance in a VM after
about 200 threads. If you
are interested, I can get into a better explination but this involves
understanding thread scheduling in
Unix (or Solaris) (LWP to thread relationship) and NT (no LWPs). IME,
it doesn't take long to
saturate a VM with 200 threads.
3) Application Avaliability. How do stability, performance and maintenance
issues affect a users
perception on avaliability. I take the users view because it's the most
important one. So, done
properly, even an unstable application can provide the preception of
high avaliability. (yes, I
know that sounds strange but....) This is where an application server
can help you out but
providing services that help make unstable application appear more
stable, adjust to load to
provide better perforamance characteristics and isolate users from
maintance issues (including
upgrading or patching the application).
4) cache coherancy...... how do you keep data in different isolated memory
spaces in sync. Well,
you could use distributed transactions which would involve every cache
that needs to be sync'ed
in the system. We all know how that would perform.... :). Or, we could
allow for latency between
updates and allow for inconsistancies between caches.
So, how does GSJ help with these problems..... for 1, 2, and 3, VM pooling
is the answer. It allows me
to replace server VM at will. Thus without taking the system down, I can
remove bloated VM's, adjust
the number of VMs in the system to account for load, deploy a new version of
an application and a new
set of VM pools to handle that new version and then migration the users over
the new version while the
old one is running. All of these activities are possible as a result of the
features in the product. These
features help deployed application appear to be more stable than they may
actually be so, we help
the perception of stability, avaliablity and performance.
The PCA helps with cache coherancy because, you only have one cache that is
shared between VMs.
And, because it's transaction, it's safe. And because you are effectivity
in the same memory space,
(even if the VM's are running on different machines) then it's transaprent
and very preformant.
BTW, all of these features are buried in J2EE complient API's. And, we have
an integrated web application
server, our own Java CORBA implementation (OMG complient). These components
are also integrated
into the activation services. I could go on with other features but, I
think this is long enough for now
Now, one real world example (I have a few to choose from). I was at one
site where the application teams
were running into a major problems. The first two problems were a memory
leak in the JDBC driver (which
causes a VM bloat problem) and not properly catching and re-throwing
expections (as per the EJB spec)
which resulted in the contain discarding the bean. While all of this was
going on, the site was maintaining
a hit rate of between 2-4 million hits/day, mostly during core business
hours. Users were not complaining
because they did not see this instability. On the other hand, the
development team was going full tilt trying
to keep the site going, identify problems and fix them. I claim that their
application would have knocked
out any other application server and brought their site down with it. I'll
debate this point with anyone, anytime
anywhere.
</post>
regards
Roberto