Hi Avoy,

good question.


At first, in a typical distributed system there is no chance to get that information in a reliable manner. In other words, it depends on what rate of "false positives" you can live with.

- If the requirement is to be 100% sure, there is one way and one way only: Do the call you want to do and check if it fails or succeeds. There is no way around actually doing the call if 100% correctness is expected, because the server in question could go out of business at ANY time, even if it told you its availability just a microsecond before.

- If the requirement is to be sure most of the time, but allow for small percentages of errors, send a ping before the real request(s) follow. There is still a small chance of a server going down after responding to the ping, but the probability is typically very low.

- If it is ok to only approximately know it and there is some fallback mechanism in place, one could think of a information store that holds all info about where the servers live and what services they offer. That directory could be a centralized one, or a distributed one, a typical example would be Apache Zookeeper. Of course, by adding more components to the system, one also gets more potential sources of problems.


Second, the call may fail even if the call itself back and forth succeeds. For example, the clients asks for some information, but the server needs some information to be retrieved from a backend database to fully process the call. That database may actually be unavailable, leading the call to fail although it "technically" succeeded. The point here is, regardless why it fails, it makes not much of a difference to the client.


To be able to tell more and/or to give more specific advice, we will need to know more about what you are actually planning to do.

Have fun,
JensG

Reply via email to