[
https://issues.apache.org/jira/browse/HDFS-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Xudong Cao updated HDFS-14963:
------------------------------
Description:
In multi-NameNodes scenery, hdfs client always begins a rpc call from the 1st
namenode, simply polls, and finally determines the current Active namenode.
This brings at least two problems:
# Extra failover consumption, especially in the case of frequent startup of
new client processes.
# Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and
then a client starts rpc with the 1st NN, it will be silent when failover from
the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd NN, it
prints some unnecessary logs, in some scenarios, these logs will be very
numerous:
{code:java}
2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby. Visit
https://s.apache.org/sbnn-error
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
...{code}
We can introduce a solution for this problem: in client side, for every hdfs
cluster, caching its current Active NameNode index in a separate cache file, so:
# When a client starts, it reads the current Active NameNode index from the
corresponding cache file based on the target hdfs uri, and then directly make
an rpc call toward the right ANN.
# After each time client failovers, it need to write the latest Active
NameNode index to the corresponding cache file based on the target hdfs uri.
Suppose there are hdfs://ns1 and hdfs://ns2, and the client own cache file
directory is /tmp, then:
# the ns1 cluster related cache file is /tmp/ns1
# the ns2 cluster related cache file is /tmp/ns2
was:
In multi-NameNodes scenery, hdfs client always begins a rpc call from the 1st
namenode, simply polls, and finally determines the current Active namenode.
This brings at least two problems:
1. Extra failover consumption, especially in the case of frequent startup of
new client processes.
2. Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and
then a client starts rpc with the 1st NN, it will be silent when failover from
the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd NN, it
prints some unnecessary logs, in some scenarios, these logs will be very
numerous:
{code:java}
2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby. Visit
https://s.apache.org/sbnn-error
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
at
...{code}
We can introduce a solution to this problem:
in client side, for every hdfs cluster, caching its current Active NameNode
index in a separate cache file, so:
1. When a client starts, it reads the current Active NameNode index from the
corresponding cache file based on the target hdfs uri, and then directly make
an rpc call toward the right ANN.
2. After each time client failovers, it need to write the latest Active
NameNode index to the corresponding cache file based on the target hdfs uri.
> Add DFS Client caching active namenode mechanism.
> -------------------------------------------------
>
> Key: HDFS-14963
> URL: https://issues.apache.org/jira/browse/HDFS-14963
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs-client
> Affects Versions: 3.1.3
> Reporter: Xudong Cao
> Assignee: Xudong Cao
> Priority: Minor
>
> In multi-NameNodes scenery, hdfs client always begins a rpc call from the 1st
> namenode, simply polls, and finally determines the current Active namenode.
> This brings at least two problems:
> # Extra failover consumption, especially in the case of frequent startup of
> new client processes.
> # Unnecessary log printing, suppose there are 3 NNs and the 3rd is ANN, and
> then a client starts rpc with the 1st NN, it will be silent when failover
> from the 1st NN to the 2nd NN, but when failover from the 2nd NN to the 3rd
> NN, it prints some unnecessary logs, in some scenarios, these logs will be
> very numerous:
> {code:java}
> 2019-11-07 11:35:41,577 INFO retry.RetryInvocationHandler:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
> Operation category READ is not supported in state standby. Visit
> https://s.apache.org/sbnn-error
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2052)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1459)
> ...{code}
> We can introduce a solution for this problem: in client side, for every hdfs
> cluster, caching its current Active NameNode index in a separate cache file,
> so:
> # When a client starts, it reads the current Active NameNode index from the
> corresponding cache file based on the target hdfs uri, and then directly make
> an rpc call toward the right ANN.
> # After each time client failovers, it need to write the latest Active
> NameNode index to the corresponding cache file based on the target hdfs uri.
> Suppose there are hdfs://ns1 and hdfs://ns2, and the client own cache file
> directory is /tmp, then:
> # the ns1 cluster related cache file is /tmp/ns1
> # the ns2 cluster related cache file is /tmp/ns2
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]