Rajith Attapattu wrote:
Hi All,

Chathura and Chamikara have posted the following proposal on the wiki.
http://wiki.apache.org/ws/FrontPage/Axis2/clustering_proposal

Next step is for us to figure out and document the demarcation points where
we would want to cluster.
Then we can start on an implementation.

Regards,

Rajith


Some observations

1. try and use other people's work if you can. Getting clustering to work properly in the face of network failures is hard. We use a partition-aware tuple space for such things; Anubis [1,2].

2. Are you planning on saving state to the servlet context? Its usually the simplest way, as the app server vendor will have solved some of the problems.

3. Servlet state can be brittle against hot redeploy on a single machine if you update the implementations, and very brittle if you do a rolling cold redeploy across a cluster. Unless you can go offline, you need to do choreographed redeploys in which you partition the cluster and have the load balancer serve the old nodes until the new nodes are up and have it switched over.

4. Round robin sucks from performance if requests in the same session arent biased towards the previous machine, just for cache (including HDD & DB) cache reasons.

5. Round robin needs to use happyaxis.jsp or equivalent to decide where to route stuff. You cannot just rely on presence of a machine as a liveness cue, you need to monitor the health of the operations.

6. Are your clusters going to be on the same site/network? What are the minimum network requirements, with WLAN and one end, and infiniband at the other?

7. How are you going to stop system management scaling at O(nodes) or worse. It can be worse unless your diagnostics are good at tracking down which machine has a problem, believe me.

8. Testing all of this gets hard indeed. Sometimes we have to resort to mathematical proofs of correctness.

Overall, you need to decide on your goals. Is it scalability or availability? Both can be done with clustering but you need good awareness of the problems before you can get it right. A High Availability system will be robust against transient network failures, and may or may not support rolling redeployment. More to the point, an underlying design that is not robust against network outages is very hard to fix, and stops you doing fun things like downsizing or upsizing the nodes based on demand, rerouting to different machines based on WS-A internals (*) and session info (i.e per-customer and geographic selection),

The other thing is that achieving consistency of behaviour in your distributed system is hard. Whoever implementing it needs to be able to argue about Lamport's papers on byzantine generals, or Gray's experiences, otherwise they haven't got the background needed to get it right. My own skills in the area are limited, which is why I delegate. But I do know why its hard.

1. Anubis is OSS, on our sourceforge project, so you could use it, but it is LGPL. while we are happy with you calling it from Apache code, I'm not sure that apache is. If we can come up with an implementation-neutral API, we may be able to implement it and so you could use it as your way of sharing state across a single-site cluster, preferably one with a decent ethernet behind it.

2. I would think that a back-end neutral SOAP/HTTP load balancer with awareness of back end availability and able to route on WS-A information is broadly useful to other SOAP stacks, including Axis1.x and Xfire. Maybe it should be a separate project with a JMX management API for live configuration. And before you do it, look at what exists in terms of HTTP load balancing in the rest of Apache. There's mod-proxy in Apache HTTPD, and there's Tomcat's own rule-based load-balancer [3].

-Steve

[1,2] http://www.hpl.hp.com/techreports/2005/HPL-2005-72.html
http://www.smartfrog.org/releasedocs/smartfrogdoc/anubis/AnubisUserGuide.pdf
[3] http://tomcat.apache.org/tomcat-5.5-doc/balancer-howto.html#Using%20the%20balancer%20webapp

(*) Load balancing is one reason I dont like WS-A; you need to parse the doc to find the URL, unless the URL is the only thing you redirect on.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to