[
https://issues.apache.org/jira/browse/HDFS-15555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188964#comment-17188964
]
Janus Chow commented on HDFS-15555:
-----------------------------------
I wonder why the new Active NN won't trigger the refresh of the cache.
> RBF: Refresh cacheNS when SocketException occurs
> ------------------------------------------------
>
> Key: HDFS-15555
> URL: https://issues.apache.org/jira/browse/HDFS-15555
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: rbf
> Environment: HDFS 3.3.0, Java 11
> Reporter: Akira Ajisaka
> Assignee: Akira Ajisaka
> Priority: Major
> Labels: pull-request-available
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient
> receives SocketException. Since
> RouterRpcClient#isUnavailableException(IOException) returns false when the
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not
> refreshed. That's why the order of the NameNodes returned by
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged
> and the active NameNode is still returned first. Therefore RouterRpcClient
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of
> restarting.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]