Some backgrounds first.
HBASE-28321 is for solving the problem where master and region server
both implement ClientMetaService, but if they use different server
principals, in our code client implementation, we can only config one
principal pattern which makes it either can not connect to master, or
can not connect to region server.

In the design doc[1], we described a way to deal with the problem, by
sending a special preamble header to the rpc server, to let the server
tell us the correct server principal. And we also describe a fallback
logic that, if we receive a FatalConnectionException with an
unexpected header, we could know the remote side is an old server and
then randomly choose a server principal to connect.

But when implementing, I found out that the fallback logic is not
easy. As when sending a FatalConnectionException back, in our current
implementation, we will use this exception to fail all the pending rpc
calls. And even if we remove this logic, the server will close the
connection, and still cause all the pending rp calls to fail.

In general, I think there are 4 ways to deal with this problem.

1. Let it go. Even if we have the fallback logic, it could still fail
if we choose the wrong server principal at client side, and the
feature is completely broken between old client and old server under
this scenario, at least we have fixed for new client and new server.
And in our compatibility guide, we do not guarantee the compatibility
between new client and old server.
2. Set a flag in the RpcConnection instance, when the upper layer
issues a retry, we will skip the security preamble call, just randomly
select a server principal to use.
3. Based on #2's effort, issue a special exception for this failure,
and in AbstractRpcClient, do not finish the stub call with this
exception, instead, just issue a new call to hide the retry logic to
the upper layer.
4. Retry at rpc connection level.

For #1, we do not need to do anything special.
For #2 and #3, we need to do more hacking work, but I can still
imagine how to archive this in our code base
For #4, I do not have ideas on how to archive this yet...

Thoughts? Thanks.


1. 
https://docs.google.com/document/d/1Cu-qzAdBGyBKM07aQP06RM0oeFSLPGtQFWuV_TDyBNg/edit?usp=sharing

Reply via email to