[ 
https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16969457#comment-16969457
 ] 

Erik Krogen commented on HDFS-14963:
------------------------------------

Very interesting proposal [~xudongcao]!

For the logging issue you discussed in (2), I think this should be fixed 
regardless of the status of this JIRA. Even if the state gets shared across 
clients on the same machine, we shouldn't be printing such unnecessary log 
statements. I would be happy to help with reviews if this is something you want 
to tackle (in a separate JIRA).

For (1), I think it makes sense. This could provide some of the benefits of 
{{IPFailoverProxyProvider}}, without the extra operational overhead of 
maintaining a VIP. This can also be useful for {{ObserverReadProxyProvider}}, 
which currently searches for both the ANN and the Observer NN.

We should think carefully about the security implications of this. The file, 
presumably, is in a world-writable location to allow for the cache to be shared 
among different users. Is there any risk of a malicious user placing 
information in this file that could be harmful? Any race conditions which could 
arise?

I like the idea but am curious to hear what others think. cc [~shv] [~elgoiri] 
[~vagarychen] [~crh]

> Add HDFS Client machine caching active namenode index mechanism.
> ----------------------------------------------------------------
>
>                 Key: HDFS-14963
>                 URL: https://issues.apache.org/jira/browse/HDFS-14963
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs-client
>    Affects Versions: 3.1.3
>            Reporter: Xudong Cao
>            Assignee: Xudong Cao
>            Priority: Minor
>
> In multi-NameNodes scenery, a new hdfs client always begins a rpc call from 
> the 1st namenode, simply polls, and finally determines the current Active 
> namenode. 
> This brings at least two problems:
>  # Extra failover consumption, especially in the case of frequent creation of 
> clients.
>  # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and 
> then a client starts rpc with the 1st NN, it will be silent when failover 
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd 
> NN, it prints some unnecessary logs, in some scenarios, these logs will be 
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
>  Operation category READ is not supported in state standby. Visit 
> https://s.apache.org/sbnn-error
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
>  at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
>  ...{code}
> We can introduce a solution for this problem: in client machine, for every 
> hdfs cluster, caching its current Active NameNode index in a separate cache 
> file named by its uri. *Note these cache files are shared by all hdfs client 
> processes on this machine*.
> For example, suppose there are hdfs://ns1 and hdfs://ns2, and the client 
> machine cache file directory is /tmp, then:
>  # the ns1 cluster related cache file is /tmp/ns1
>  # the ns2 cluster related cache file is /tmp/ns2
> And then:
>  #  When a client starts, it reads the current Active NameNode index from the 
> corresponding cache file based on the target hdfs uri, and then directly make 
> an rpc call toward the right ANN.
>  #  After each time client failovers, it need to write the latest Active 
> NameNode index to the corresponding cache file based on the target hdfs uri.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to