(First post and it's long - bear with me)

I'm currently looking into things like fault tolerance and load balancing when
building applications from components. This has included looking at the pros and
cons of COM+, EJB, CORBA Components and Fault Tolerant CORBA.

One thing which seems conspiciously missing from most documentation I've perused
so far is replication of state combined with load balancing, as opposed to just
for the sake of redundancy. Sounds crazy? Well, let me explain further...

Take a system built solely on stateless components (let's ignore the
technology: it doesn't really matter if it's COM+ or EJB or something else). To
achieve scalability, one typically configures a cluster of servers, each running
a replica of the stateless component. Since there is no state, fail-over becomes
as easy as simply load balancing a request to a new server in the case one of
them fails, and as long as at least one member of the cluster is running, the
system isn't considered 'down'.
  No network overhead is required to keep the compomnents' state synchronized
(at least not from the component's point of view - an underlying distributed
database might of course disagree :-) since any component can serve any
request. The cluster can thus scale well to lots of machines, and no machines
are "wasted" as standy backup servers.

Contrast this to the case where a component is stateful: The typical
fault-tolerance configurations are the cold (logged state), warm (checkpointed
state to standby replica), or hot (several replicas all serving requests) ones.
(This is CORBA terminology but the general idea should be clear, I hope.)

Now... with state, it seems we either end up with (at least) one machine simply
standing as a backup machine in case something goes wrong with the primary one
(cold, warm) or have two (or more) machines all doing the same work, and just
one answer getting passed back to the client. Compared to the stateless case,
this doesn't scale nearly as well, and it wastes computational resources since
all machines can't serve requests all the time.

So to finally get to the point: Does anyone here have knowledge of or opinions
about how (if at all possible) one could design a system that allowed stateful
components to be load-balanced in a large cluster, like stateless ones, but with
maintained state consistency across (enough - perhaps not necessarily
all) replicas that the system remains fault tolerant? Contrasting issues here
are of course that one wants to keep recovery times due to server failures to a
minimum, while at the same time minimizing the network overhead required to keep
components' state synchronized.

It's pretty obvious that a stateful component can't be made to behave just like
a stateless one... But maybe it's possible to get close? The benefit of this is
of course to free the component programmer from the constrictions of the
stateless programming model and let the system adress those issues, while
maintaining the scalability advantages of the stateless
fault-tolerance-becomes-load-balancing system.

You read all the way to here? Be proud of yourself! :-)

/Jonas

===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST".  For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".

Reply via email to