Akira Ajisaka created HDFS-15555:
------------------------------------
Summary: RBF: Refresh cacheNS when SocketException occurs
Key: HDFS-15555
URL: https://issues.apache.org/jira/browse/HDFS-15555
Project: Hadoop HDFS
Issue Type: Sub-task
Components: rbf
Reporter: Akira Ajisaka
Assignee: Akira Ajisaka
Problem:
When active NameNode is restarted and loading fsimage, DFSRouters significantly
slow down.
Investigation:
When active NameNode is restarted and loading fsimage, RouterRpcClient receives
SocketException. Since RouterRpcClient#isUnavailableException(IOException)
returns false when the argument is SocketException, the
MembershipNameNodeResolver#cacheNS is not refreshed. That's why the order of
the NameNodes returned by
MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged
and the active NameNode is still returned first. Therefore RouterRpcClient
still tries to connect to the NameNode that is loading fsimage.
After loading the fsimage, the NameNode throws StandbyException. The exception
is one of the 'Unavailable Exception' and the cacheNS is refreshed.
Workaround:
Stop NameNode and wait 1 minute before starting NameNode instead of restarting.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]