On Aug 29, 2008, at 4:38 PM, David Blevins wrote:

Been looking into how to do service discovery between client/server and server/server.

We've got some code in the client to do failover on a server list but by default it really isn't wired up. It'd be nice to get something in so even if we don't support full clustered replication, at the very least we could support a bunch of servers that "work together" in a stateless fashion.

Looking at adding a new "multicast" server service that just advertises the URIs of the other networks services available in the system. We'd probably want it off by default (maybe), but then a client could just sort of boot without being pointed to a specific server address and theoretically find a server to talk to.

More progress on the multicast discovery with grouping and failover.

On the area of tracking and communicating a new server list for the cluster I added a ClusterMetaData which is similar to what the ServerMetaData was aiming to be. The issue with the ServerMetaData is that it's tracked on a per-ejb-proxy basis and any updates to the list of servers in the cluster are only reflected in the proxy immediately used. All other proxies will still hold onto the outdated list. Second, not all request types could be clustered and have failover, essentially only ejb requests could failover, jndi and authentication could not. Now the ClusterMetaData version associated with the ServerMetaData is sent to the server *before* the main request and then the server can send back a new list regardless of which type of request it is.


On the failover side, I essentially had to rewrite all of the synchronization related to connection management and shutting down the server. The code simply was not written so that it could be stopped cleanly from a client's perspective. It's now capable of a graceful shutdown which is of course critical for failover. Active requests will be finished, new requests will be refused, inactive connections with the client will be cleanly and immediately closed on the server side. There is a bit of a trick in this regard as even if one side perfectly and cleanly closes it may not be visible to the other side for quite a while. Still hacking on that. Chatted with Hiram about it a bit and it doesn't really sound like there is a "cure", only things you can do to overcome the symptoms.

-David

Reply via email to