OK, in all those hours I have free as a student ;-) I have been thinking
about clustering, and this is what I've come up with.

Four parts of the EJB server need to become distributed for clustering to
work.  By 'clustering' here I mean this:

* In normal operation both nodes are usable.
* In the case that one node fails, clients which are using objects on that
  node might see wierd things at this stage, but there are ways around
  that.
* In normal operation, client requests on the _home_ interface are load
  balanced.  The balancing policy is just that - a matter of policy which
  needn't concern us at this stage.
* It should be possible to add and remove nodes from the cluster
  transparently.

Here is how I propse to acheive this.

1. Semi-federated JNDI.
-----------------------
I propose that some sort of delegation model be used for JNDI.  There is a
'public' JNDI interface and a 'private' JNDI interface.

The private interface is just an ordinary server as already exists, one on
each node, which keeps a list of the remote objects registered in that
node.  This means that, rather than having one central instance of each
service which co-ordinates the nodes, each node has it's own instance of
each service and they all co-operate.  This includes JNDI.

The public interface is available on each node.  It keeps a list of all
other live nodes in the cluster, and delegates to them in a round-robin
way.  One request may be delegated to several servers before a match is
found for the name.  This inefficiency should only happen in exceptional
circumstances (a node is going down for maintenance etc).  The worst case
is when no match for the name can be found, in which case every node will
be checked.  This is reasonable, since it indicates either the deployment
is broken, or the user made an error.  In 'normal' circumstances, this
should not happen.  In 'ideal' operating conditions (every node has every
service available) there will be only one delegation for every request.

The list of other live nodes would probably be best acheived by a regular
broadcast ping.  This sounds inefficient, but it is how commercial vendors
(like Borland Visibroker) acheive similar ends.

2 Distributed J2EE deployer.
----------------------------
When a package is deployed, it should be transferred to every node and
deployed on them also.  It may be safe to assume that if one node can
deploy a package, then the others can too, or you might want to check them
all and then undeploy if it fails.

3 Distributed home interfaces.
------------------------------
The home interfaces need to be distributed to implement the sort of load
balancing I have in mind.  The idea is this.  Pretty much any operation
you can do on a home interface involves either activating an instance of a
bean or finding an instance.  So there are to activities:

3.1 Activating an instnace.
---------------------------
Here the home interface finds the node which is least loaded (the
definition of 'least loaded' depends on your load balancing policy - it
might mean the next one in a list) and activates the instance on that
node, returning the reference to the client.

3.2 Finding an instance.
------------------------
This might well require a sort of braodcast to see if anyone has an
instance already activated as that bean.  If so, return that instance.  If
not, use the activation described in 3.1.

This sounds messy, but the aim is to simplify things by restricting an
entity to only existing on one node at a time.  This makes distributed
transaction a hell of a lot easier, as far as I can see.

4 Distributed transaction manager.
----------------------------------
I don't think this is as bad as it sounds, if we can do a two phase commit
on the beans.  Then we need to find some way of propogating a unique
transaction ID in bean to bean calls.  The transaction management would
look something like this:

* Client makes a call on bean A.  Transaction is started (given
  appropriate transaction attribute).  Unique transaction ID is assigned
  (UUID might be necessary here).  Say this is XID 111.
* Bean A makes call on bean B.  New transaction is started on bean B with
  the same transaction ID, ie. 111.  This is independant of which node the
  transaction is started on.  Bean A keeps a note that it has invoked bean
  B in this transaction.
* Method on bean B completes, but bean B stays associated with transaction
  111.
* Method on bean A completes, and we decide whether to commit or rollback.
     - Commit
       Do a two phase commit on each node which bean A has accessed.  This
       commit needs to propogate through the entire invocation tree, so
       any bean which bean B invoked methods on in transaction 111 would
       have to be told to commit by bean B and so on.
     - Rollback
       Rollbacks propogate through the invocation tree in the same way as
       commits, but they don't have to be two-phase like commits do.

5 Remarks
---------
I think there are still a number of issues with this strategy, including:

* I have no training or experience in the area, so I'm guessing.  There
  are probably issues I've missed.
* It sounds to me like there are some performance problems with this
  strategy.  I'm not sure about this though.  This seems to me to be the
  easiest way of implementing clustering, but it may be that it's
  performance makes it not worth the trouble.

This is just my ideas on how to implement clustering.  Ideas are almost
never good ideas when they are first proposed, because whoever proposed
them is blind to their flaws (or else they wouldn't have proposed them,
would they?).  So comment!  Let's get a strategy for clustering worked
out!  My signature not-withstanding, the more we work on this, the sooner
it'll happen.


Tom
-- 
"If you mess with something for long enough it will break." - Schmidt


Reply via email to