Jonas,
It's not surprising that you can't find much on this because it's
a very difficult problem. Our server product does handle this, so in
that context I can explain how we achieve scalability and load-balancing
with Secant Extreme.
<vendor>
To start with, there are three fundamental components needed, each tightly
integrated together:
1. Container (or what we call a "service"). Each of these containers is
named and contains a list of one or more Home objects can be located via
naming services. These containers can run in one of two modes, single-
instance, or multi-instance. For load-balancing stateful objects, the
multi-
instance mode is used. In this mode, the server management layer can run
many concurrent instances of the service across a number of processes and
host computers.
2. A transactional workspace. Within this workspace exists entity objects
(beans (entity or stateful sessions), persistent objects) that have been
modified in the course of a transaction. The contents of these workspaces
can be saved to secondary storage to implement session-level recovery if
the server crashes and the transaction needs to be reconstructed.
3. A shared object cluster. This cluster is an in-memory container of
stateful objects that can span multiple transactions and process boundaries.
Entities that are read from the database are added to the cluster first
and then copied to the workspace if they are modified. When a transaction
completes, the dirtied objects in the workspace are "written through" the
cluster so that the state is updated in both the database and the cluster.
The cluster implementation requires a good distributed locking mechanism
as well as low-level communications routines to implement the replication
of state to other server processes.
Load-balancing is achieved by implementing a dynamic naming service that
selects the least-loaded container implementing the home object of the
desired class. Once a container is activated in a transaction, all further
activities involving it's entities and other requested home objects
implemented by that same container type will use the same container instance
(thus all stateful objects created/changed in for that container (service)
type will be co-located in the same process).
Fault-tolerance is achieved via the implementation of a transaction
manager that discovers a process has crashed and rolls-back all workspaces
in the remaining process that were involved in transactions common to
the crashed process. Since the cluster spans multiple server
processes, the "shared" objects remain in tact. In most cases, the
transaction manager will restart the failed process, which resynchronizes
with the cluster.
For the clients (and HTTP sessions) that were involved with the failed
transaction, there are a number of recover options. First, and most
simply, they can give up with what they were doing, inform the user, and
try again. Second, the transaction can be recovered with a session
manager that holds the state of the objects in all workspaces of the
failed transaction; For each one it would recreate the workspace in a
new server process and attempt to re-acquire any shared references and
locks. If all that succeeds, the user will see only a slight delay
but work can proceed as if nothing happened.
</vendor>
For the most part Jonas, it still boils down to the basics of transaction
management, locking, object persistence, and a session manager that can
recover uncommitted data.
John Pompeii
CTO, Secant Technologies
----- Original Message -----
From: Jonas Wallenius <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Sent: Wednesday, October 27, 1999 6:51 PM
Subject: State and pooling?
> (First post and it's long - bear with me)
>
> I'm currently looking into things like fault tolerance and load balancing
when
> building applications from components. This has included looking at the
pros and
> cons of COM+, EJB, CORBA Components and Fault Tolerant CORBA.
>
> One thing which seems conspiciously missing from most documentation I've
perused
> so far is replication of state combined with load balancing, as opposed to
just
> for the sake of redundancy. Sounds crazy? Well, let me explain further...
>
> Take a system built solely on stateless components (let's ignore the
> technology: it doesn't really matter if it's COM+ or EJB or something
else). To
> achieve scalability, one typically configures a cluster of servers, each
running
> a replica of the stateless component. Since there is no state, fail-over
becomes
> as easy as simply load balancing a request to a new server in the case one
of
> them fails, and as long as at least one member of the cluster is running,
the
> system isn't considered 'down'.
> No network overhead is required to keep the compomnents' state
synchronized
> (at least not from the component's point of view - an underlying
distributed
> database might of course disagree :-) since any component can serve any
> request. The cluster can thus scale well to lots of machines, and no
machines
> are "wasted" as standy backup servers.
>
> Contrast this to the case where a component is stateful: The typical
> fault-tolerance configurations are the cold (logged state), warm
(checkpointed
> state to standby replica), or hot (several replicas all serving requests)
ones.
> (This is CORBA terminology but the general idea should be clear, I hope.)
>
> Now... with state, it seems we either end up with (at least) one machine
simply
> standing as a backup machine in case something goes wrong with the primary
one
> (cold, warm) or have two (or more) machines all doing the same work, and
just
> one answer getting passed back to the client. Compared to the stateless
case,
> this doesn't scale nearly as well, and it wastes computational resources
since
> all machines can't serve requests all the time.
>
> So to finally get to the point: Does anyone here have knowledge of or
opinions
> about how (if at all possible) one could design a system that allowed
stateful
> components to be load-balanced in a large cluster, like stateless ones,
but with
> maintained state consistency across (enough - perhaps not necessarily
> all) replicas that the system remains fault tolerant? Contrasting issues
here
> are of course that one wants to keep recovery times due to server failures
to a
> minimum, while at the same time minimizing the network overhead required
to keep
> components' state synchronized.
>
> It's pretty obvious that a stateful component can't be made to behave just
like
> a stateless one... But maybe it's possible to get close? The benefit of
this is
> of course to free the component programmer from the constrictions of the
> stateless programming model and let the system adress those issues, while
> maintaining the scalability advantages of the stateless
> fault-tolerance-becomes-load-balancing system.
>
> You read all the way to here? Be proud of yourself! :-)
>
> /Jonas
>
>
===========================================================================
> To unsubscribe, send email to [EMAIL PROTECTED] and include in the
body
> of the message "signoff EJB-INTEREST". For general help, send email to
> [EMAIL PROTECTED] and include in the body of the message "help".
>
>
>
===========================================================================
To unsubscribe, send email to [EMAIL PROTECTED] and include in the body
of the message "signoff EJB-INTEREST". For general help, send email to
[EMAIL PROTECTED] and include in the body of the message "help".