On Fri, 9 Mar 2001, Sacha Labourey wrote:

> Hello Tom,
> 
> > * In normal operation both nodes are usable.
> 
> I guess by "both" you mean "any" or "all"

Yes, sorry, I got stuck thinking in two-node mode.  I meant all.  The
meaning was really that this is an active, not passive, cluster.

> > * In the case that one node fails, clients which are using objects on that
> >   node might see wierd things at this stage, but there are ways around
> >   that.
> > * In normal operation, client requests on the _home_ interface are load
> >   balanced.  The balancing policy is just that - a matter of policy which
> >   needn't concern us at this stage.
> > * It should be possible to add and remove nodes from the cluster
> >   transparently.
> 
> ok.

I think that this last point is one which is being lost on some
people.  It means that there can be no central controller.  Everything
must work in a completely distributed way.  I have tried to describe how
this might work.  For instance, there can not be a 'master' jndi
repository, which is replicated, nor can there be a 'master' transaction
manager, message queue...  anything.  This is why the private/public JNDI
tree became necessary, so that some services could retain an identity at
each node, while beans are federated across the nodes, and a lookup on
them may return an instance at any node.

> > 1. Semi-federated JNDI.
> > -----------------------
> > I propose that some sort of delegation model be used for JNDI.  There is a
> > 'public' JNDI interface and a 'private' JNDI interface.
> >
> > The private interface is just an ordinary server as already exists, one on
> > each node, which keeps a list of the remote objects registered in that
> > node.  This means that, rather than having one central instance of each
> > service which co-ordinates the nodes, each node has it's own instance of
> > each service and they all co-operate.  This includes JNDI.
> >
> > The public interface is available on each node.  It keeps a list of all
> > other live nodes in the cluster, and delegates to them in a round-robin
> > way.  One request may be delegated to several servers before a match is
> > found for the name.  This inefficiency should only happen in exceptional
> > circumstances (a node is going down for maintenance etc).  The worst case
> > is when no match for the name can be found, in which case every node will
> > be checked.  This is reasonable, since it indicates either the deployment
> > is broken, or the user made an error.  In 'normal' circumstances, this
> > should not happen.  In 'ideal' operating conditions (every node has every
> > service available) there will be only one delegation for every request.
> 
> IMO, this is not efficient.
> 1st, it would be nice to be able to determine that a particular application
> need to be deployed to only some particular nodes.

Yes, but that would not be a huge hurdle once we had this system
implemented.  In the absolute simplest case, a node which does not have a
bean deployed may simply report that it's pool for that bean is 100% full,
so it will never get invocations on it.

> 2nd, this "where-could-i-find-such-a-bean..." mechanism is probably not
> strict enough.

What do you mean be this?  How is it not 'strict' enough?

> BTW, you say that you want a private and a public JNDI interface. Why? Is it
> to optimize local calls?

No, like I said above, I want to be able to have federated and
non-federated objects in JNDI.

Having thought this through a little more, what I really meant was that
objects in JNDI should be treated differently depending on whether they
are services or EJBs.  A service should have an identity at each node
within the cluster.  For a 10-node cluster, there will be 10 transaction
managers, which are co-operating to provide distributed
transactions.  Each one still retains it's own identity.  So there is a
transaction manager in each 'private' JNDI, which is not visible to the
client anyway.  Then there are beans, which have to exist as one entity
across all the nodes.  Do you see what I am saying?  There are two
different sorts of clustering happening here.  One is that of many
individuals working independantly to provide a service, the other is one
corporate body making up a service.  The 'public/private' thing might be a
bit confusing, it was just a way of implementing this.

> Have you read my previous proposition? I suggested something like you but we
> would have only one JNDI space replicated at different places (possibly and
> most probably on each node) with the possibility (extended JNDI semantic) to
> bind several (beans) reference for a same JNDI name. Consequently:
>       - for whatever JNDI node a client decides to speak with, this node exactly
> knows which nodes have subscribed a reference to this name, without
> prompting its neighboors
>       - local code optimisation is (can be) achieved
>       - if an application is deployed only on a few nodes (not all nodes), the
> related JNDI names of this application will receive bind request only from
> these nodes (even if all nodes have this knowledge)

Yes, I saw this, but like I say, multiple references under a name are not
really enough to provide what I am trying to acheive.  What I propose
requires that some objects have different identities at different nodes,
while others have the same identity at every node.

> > The list of other live nodes would probably be best acheived by a regular
> > broadcast ping.  This sounds inefficient, but it is how commercial vendors
> > (like Borland Visibroker) acheive similar ends.
> 
> Using a group communication framework would provide us a solution for the
> replication issue I was just speaking about and, furthermore, also provides
> a failure detector which can be parameterized (ping, broadcast, multicast,
> ...). (in fact, the failure detection mechanism is a basic layer of the
> group communication problem)
> 
> What do you think?

I like this group communication thingy, except for that everyone I've
heard explaining it seems to have advocated having a central controller,
which gets messy in the case of the controller failing.  I have tried to
avoid this.

Tom
--
"If you mess with something for long enough, it will break."
        - Schmidt's first law of engineering.



Reply via email to