Ted & Billie I'm updating the YARN-2683 registry documentation, including some guidelines on how to handle failure to communicate with a remote server.
This is a problem which the Slider REST client has itself; currently the REST client fails immediately; if a new attempt is made to look up the AM service API URL it will pick up a new value. The API client is therefore not handling the rebinding itself. Which is a bit of a get-out; I'm relying on the AM not failing very often. Provided the clients don't hold onto their client API objects for long, they "should" be OK. What do HBase and Accumulo do here? I know they publish their binding info via ZK —but how do they fail over?
