[
https://issues.apache.org/jira/browse/HDFS-14442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16935991#comment-16935991
]
Erik Krogen commented on HDFS-14442:
------------------------------------
I see. So we have this block:
{code}
private URI getCurrentNamenodeAddress(Path target) throws IOException {
//String nnAddress = null;
Configuration conf = getConf();
//get the filesystem object to verify it is an HDFS system
final FileSystem fs = target.getFileSystem(conf);
if (!(fs instanceof DistributedFileSystem)) {
System.err.println("FileSystem is " + fs.getUri());
return null;
}
return DFSUtil.getInfoServer(HAUtil.getAddressOfActive(fs), conf,
DFSUtil.getHttpClientScheme(conf));
}
{code}
Which attempts to find the active via {{HAUtil.getAddressOfActive}}. In this
case it's returning the observer, which works fine until a write ({{-delete}})
is attempted, at which point it fails.
I see two viable approaches:
# Leverage {{HAUtil.getProxiesForAllNameNodesInNameservice}} to fetch proxies
for all of the NameNodes, then check their state. I think this is the same as
what you pasted above. This should be possible from within {{HAUtil}} as far as
I can tell?
# Enhance the APIs provided by {{AbstractNNFailoverProxyProvider}} to include
something like {{getActiveInvocationHandler}}, {{getActiveAddress}}, etc. and
delegate the responsibility to the proxy provider.
> Disagreement between HAUtil.getAddressOfActive and
> RpcInvocationHandler.getConnectionId
> ---------------------------------------------------------------------------------------
>
> Key: HDFS-14442
> URL: https://issues.apache.org/jira/browse/HDFS-14442
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.3.0
> Reporter: Erik Krogen
> Priority: Major
>
> While working on HDFS-14245, we noticed a discrepancy in some proxy-handling
> code.
> The description of {{RpcInvocationHandler.getConnectionId()}} states:
> {code}
> /**
> * Returns the connection id associated with the InvocationHandler instance.
> * @return ConnectionId
> */
> ConnectionId getConnectionId();
> {code}
> It does not make any claims about whether this connection ID will be an
> active proxy or not. Yet in {{HAUtil}} we have:
> {code}
> /**
> * Get the internet address of the currently-active NN. This should rarely
> be
> * used, since callers of this method who connect directly to the NN using
> the
> * resulting InetSocketAddress will not be able to connect to the active NN
> if
> * a failover were to occur after this method has been called.
> *
> * @param fs the file system to get the active address of.
> * @return the internet address of the currently-active NN.
> * @throws IOException if an error occurs while resolving the active NN.
> */
> public static InetSocketAddress getAddressOfActive(FileSystem fs)
> throws IOException {
> if (!(fs instanceof DistributedFileSystem)) {
> throw new IllegalArgumentException("FileSystem " + fs + " is not a
> DFS.");
> }
> // force client address resolution.
> fs.exists(new Path("/"));
> DistributedFileSystem dfs = (DistributedFileSystem) fs;
> DFSClient dfsClient = dfs.getClient();
> return RPC.getServerAddress(dfsClient.getNamenode());
> }
> {code}
> Where the call {{RPC.getServerAddress()}} eventually terminates into
> {{RpcInvocationHandler#getConnectionId()}}, via {{RPC.getServerAddress()}} ->
> {{RPC.getConnectionIdForProxy()}} ->
> {{RpcInvocationHandler#getConnectionId()}}. {{HAUtil}} appears to be making
> an incorrect assumption that {{RpcInvocationHandler}} will necessarily return
> an _active_ connection ID. {{ObserverReadProxyProvider}} demonstrates a
> counter-example to this, since the current connection ID may be pointing at,
> for example, an Observer NameNode.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]