Re: How do HBase and Accumulo handle failures during client RPC operations

Josh Elser Wed, 04 Feb 2015 08:12:52 -0800

For Accumulo, like you said, information is published in ZK for clientsto find. Thinking about just the Accumulo master process (tabletserversfollow the same principle but in a slightly different way), clients willcache that location from ZK and then on some RPC transport failure (e.g.ConnectException -- Thrift exception in Accumulo's case), the clientwill invalidate that cache, refresh the location from ZK and try again.

The exit is either the client gets a connection to the master afterretrying enough, or the code just gives up. I think we tend to keepspinning in Accumulo (perhaps more than we should) which hides"expected" failures from clients completely (clients don't have to beaware that they're talking to a new master than they were before), butit can make transport issues harder to diagnose.


Steve Loughran wrote:

Ted&  Billie

I'm updating the YARN-2683 registry documentation, including some guidelines on 
how to handle failure to communicate with a remote server.

This is a problem which the Slider REST client has itself; currently the REST 
client fails immediately; if a new attempt is made to look up the
AM service API URL it will pick up a new value.

The API client is therefore not handling the rebinding itself. Which is a bit 
of a get-out; I'm relying on the AM not failing very often. Provided
the clients don't hold onto their client API objects for long, they "should" be 
OK.

What do HBase and Accumulo do here? I know they publish their binding info via 
ZK —but how do they fail over?

Re: How do HBase and Accumulo handle failures during client RPC operations

Reply via email to