[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

Janus Chow (Jira) Tue, 01 Sep 2020 21:17:51 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17188964#comment-17188964
 ]


Janus Chow commented on HDFS-15555:
-----------------------------------

I wonder why the new Active NN won't trigger the refresh of the cache.

> RBF: Refresh cacheNS when SocketException occurs
> ------------------------------------------------
>
>                 Key: HDFS-15555
>                 URL: https://issues.apache.org/jira/browse/HDFS-15555
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: rbf
>         Environment: HDFS 3.3.0, Java 11
>            Reporter: Akira Ajisaka
>            Assignee: Akira Ajisaka
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters 
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient 
> receives SocketException. Since 
> RouterRpcClient#isUnavailableException(IOException) returns false when the 
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not 
> refreshed. That's why the order of the NameNodes returned by 
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
> and the active NameNode is still returned first. Therefore RouterRpcClient 
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The 
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of 
> restarting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

Reply via email to