Hi all, today I would like to discuss some options to improve the load balancing support of Apache Synapse. As not all of my ideas have settled, I may miss some pieces of the current implementation and would like to get some feedback about my ideas, I decided to not create a JIRA for that immediately. Though, after our discussion I would like to summarize the results in a JIRA.
1) Where can I review the status of the endpoints of a loadbalance group? It should be possible to query the status of each endpoint via JMX. It should also be possible to get the number of configured as well as active endpoints of a load balance group via JMX. This way it will be possible to use some meaningful external monitoring. For example the user could define an alert if only 2 nodes are left or the ratio of available nodes is less then 20% or something like this. 2) Another very useful feature would be the possibility to manually deactivate an endpoint of a load balancing group. If I understand it correctly right now you have to remove the endpoint from the group and restart your server (or cluster gracefully). Not very nice. To implement this, it might make sense to differ between three states: "active", "deactivated" and "manually deactivated". A manually deactivated endpoint can only be manually reactivated. Automatic retry will not be used for endpoints in that state. 3) Why did you choose the interpretation of a missing suspendDurationOnFailure that it will never be recovered after a failure? At least from my point of view this does not match my intuition and expectations. Is this really a good default value? When does a user ever want to have this effect? Do I understand this wrong, or does the user have to restart the ESB to change the status back to "active"? 4) A static, configurable value for suspendDurationOnFailure is better than having a hardcoded value, but is also not optimal. The user has always the problem that he tries to balance between different side effects depending on the cause of the service outage. When you think about short network instabilities and you have a small cluster (think of two nodes) you are somehow forced to keep that check interval rather short. If then suddenly a service fails for some other reason and a long period of time, this has a negative impact on the performance, as the retries happen to often. It would be much better to use a dynamic approach with a changing check interval. Start frequently (short interval of a few seconds) and increase this up to a maximum value based on the number of tries. Maybe one could come up with a general purpose function, where the user can specify the arguments. This should allow preserving the existing behaviour while also supporting better suited algorithms. 5) When *all* nodes are inactive, the ESB currently creates a fault immediately. I'm thinking whether this makes sense or not. Maybe it would be best, if the user could decide between two options: a) current behaviour b) first try all inactive endpoints until either one endpoint works, or all endpoints have been tried out once and only then issue a fault I'm not sure about this one. But the following happened during a test of a minimal service cluster with two nodes. The suspendDurationOnFailure had been set to 60 seconds. The first node had been passive due to some maintenance. So all requests have been served via the second node. Suddently a short network outage happened. The second node was marked as deactivated. It was reachable in the next second but the ESB marked it as passive. Actually the whole system was down for one minute. So you have to think about a shorter period of time for the check interval, which again would be bad for the server which has been down for maintenance. If the ESB would have done one additional round of retries, it would have detected that the endpoint in fact is already up again. Now I hope to receive a lot of comments and feedback. Maybe we can work together to make improvements in this area. Please point me to some existing functionality I may have missed! Regards, Eric --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
