[ 
https://issues.apache.org/jira/browse/HDFS-16369?focusedWorklogId=689760&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-689760
 ]

ASF GitHub Bot logged work on HDFS-16369:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Dec/21 04:56
            Start Date: 03/Dec/21 04:56
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on a change in pull request #3745:
URL: https://github.com/apache/hadoop/pull/3745#discussion_r761646273



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRPCMultipleDestinationMountTableResolver.java
##########
@@ -668,14 +674,16 @@ public void testInvokeAtAvailableNs() throws IOException {
     // Make one subcluster unavailable.
     MiniDFSCluster dfsCluster = cluster.getCluster();
     dfsCluster.shutdownNameNode(0);
+    dfsCluster.shutdownNameNode(1);
     try {
       // Verify that #invokeAtAvailableNs works by calling #getServerDefaults.
       RemoteMethod method = new RemoteMethod("getServerDefaults");
       FsServerDefaults serverDefaults =
           rpcServer.invokeAtAvailableNs(method, FsServerDefaults.class);
       assertNotNull(serverDefaults);

Review comment:
       I thought of using the NamenodeMetrics, but it doesn't track 
getServerDefaults. Second is RBFClientMetrics. It tracks all the invokes for 
getServerDefault, irrespective of success and it isn't per namespace as well. 
So, in case the invocation order by any chance changes, from ns0->ns1->n2 to 
ns0->ns2->ns1 in that case the test will become flaky.
   
   In general without the fix, the getServerDefault call will fail only if 2 NS 
is down, so I thought if the call is successful we can conclude things work. 
Let me know if you have any ideas around what we can assert




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 689760)
    Time Spent: 50m  (was: 40m)

> RBF: Fix the retry logic of RouterRpcServer#invokeAtAvailableNs
> ---------------------------------------------------------------
>
>                 Key: HDFS-16369
>                 URL: https://issues.apache.org/jira/browse/HDFS-16369
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Ayush Saxena
>            Assignee: Ayush Saxena
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> As of now invokeAtAvailableNs, retries only once if the default or the first 
> namespace is not available, despite having other namespaces available.
> Optimise to retry on all namespaces.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to