On Sep 12, 2008, at 12:45 PM, David Blevins wrote:
I originally had the version be a simple "increment by one" strategy, but eventually went with the value of System.currentTimeMillis(). It's possible more than one server is reachable via the ServerMetaData (i.e. multicast://) and each server has it's own list and version number. Secondly, if a server is restarted, the version number will go back to zero and the client could be stuck thinking it has a more current list than the server.
Time sometimes moves backwards on servers with connected to a time server. How about something slightly more unique like a 16 bit rand + the most significant 48 bits of the system time? 48 bits of milliseconds is like 9000 years.
When a server shuts down, more connections are refused, existing connections not in mid-request are closed, any remaining connections are closed immediately after completion of the request in progress and clients can failover gracefully to the next server in the list. If a server crashes requests are retried on the next server in the list. This failover pattern is followed until there are no more servers in the list at which point the client attempts a final multicast search (if it was created with a multicast PROVIDER_URL) before abandoning the request and throwing an exception to the caller. Currently, the failover is ordered but could very easily be made random. The multicast discovery aspect of the client adds a nice randomness to the selection of the first server that is perhaps somewhat "just". Theoretically, servers that are under more load will send out less heart beats than servers with no load. This may not happen as theory dictates, but certainly as we get more ejb statistic data wired into the server functionality we can pursue deliberate heartbeat throttling techniques that might make that theory really sing in practice.
Very cool. -dain
