[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

chuanjie.duan (Jira) Wed, 24 Apr 2024 00:20:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-15555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17840305#comment-17840305
 ]


chuanjie.duan commented on HDFS-15555:
--------------------------------------

[~elgoiri] [~aajisaka] not sure why delete "ioe instanceof ConnectException"

> RBF: Refresh cacheNS when SocketException occurs
> ------------------------------------------------
>
>                 Key: HDFS-15555
>                 URL: https://issues.apache.org/jira/browse/HDFS-15555
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: rbf
>    Affects Versions: 3.3.1, 3.4.0
>         Environment: HDFS 3.3.0, Java 11
>            Reporter: Akira Ajisaka
>            Assignee: Akira Ajisaka
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.1, 3.4.0
>
>          Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> Problem:
> When active NameNode is restarted and loading fsimage, DFSRouters 
> significantly slow down.
> Investigation:
> When active NameNode is restarted and loading fsimage, RouterRpcClient 
> receives SocketException. Since 
> RouterRpcClient#isUnavailableException(IOException) returns false when the 
> argument is SocketException, the MembershipNameNodeResolver#cacheNS is not 
> refreshed. That's why the order of the NameNodes returned by 
> MemberShipNameNodeResolver#getNamenodesForNameserviceId(String) is unchanged 
> and the active NameNode is still returned first. Therefore RouterRpcClient 
> still tries to connect to the NameNode that is loading fsimage.
> After loading the fsimage, the NameNode throws StandbyException. The 
> exception is one of the 'Unavailable Exception' and the cacheNS is refreshed.
> Workaround:
> Stop NameNode and wait 1 minute before starting NameNode instead of 
> restarting.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-15555) RBF: Refresh cacheNS when SocketException occurs

Reply via email to