Some backgrounds first. HBASE-28321 is for solving the problem where master and region server both implement ClientMetaService, but if they use different server principals, in our code client implementation, we can only config one principal pattern which makes it either can not connect to master, or can not connect to region server.
In the design doc[1], we described a way to deal with the problem, by sending a special preamble header to the rpc server, to let the server tell us the correct server principal. And we also describe a fallback logic that, if we receive a FatalConnectionException with an unexpected header, we could know the remote side is an old server and then randomly choose a server principal to connect. But when implementing, I found out that the fallback logic is not easy. As when sending a FatalConnectionException back, in our current implementation, we will use this exception to fail all the pending rpc calls. And even if we remove this logic, the server will close the connection, and still cause all the pending rp calls to fail. In general, I think there are 4 ways to deal with this problem. 1. Let it go. Even if we have the fallback logic, it could still fail if we choose the wrong server principal at client side, and the feature is completely broken between old client and old server under this scenario, at least we have fixed for new client and new server. And in our compatibility guide, we do not guarantee the compatibility between new client and old server. 2. Set a flag in the RpcConnection instance, when the upper layer issues a retry, we will skip the security preamble call, just randomly select a server principal to use. 3. Based on #2's effort, issue a special exception for this failure, and in AbstractRpcClient, do not finish the stub call with this exception, instead, just issue a new call to hide the retry logic to the upper layer. 4. Retry at rpc connection level. For #1, we do not need to do anything special. For #2 and #3, we need to do more hacking work, but I can still imagine how to archive this in our code base For #4, I do not have ideas on how to archive this yet... Thoughts? Thanks. 1. https://docs.google.com/document/d/1Cu-qzAdBGyBKM07aQP06RM0oeFSLPGtQFWuV_TDyBNg/edit?usp=sharing