[ 
https://issues.apache.org/jira/browse/HDFS-15900?focusedWorklogId=577308&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-577308
 ]

ASF GitHub Bot logged work on HDFS-15900:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 06/Apr/21 04:33
            Start Date: 06/Apr/21 04:33
    Worklog Time Spent: 10m 
      Work Description: tasanuma commented on a change in pull request #2866:
URL: https://github.com/apache/hadoop/pull/2866#discussion_r607493824



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRPCClientRetries.java
##########
@@ -155,10 +155,10 @@ public void testRetryWhenOneNameServiceDown() throws 
Exception {
     // Renew lease for the DFS client, it will succeed.
     routerProtocol.renewLease(client.getClientName());
 
-    // Verify the retry times, it will retry one time for ns0.
+    // Verify the retry times, it should succeed with no retry as long as at 
least one of the nameservices is ACTIVE.
     FederationRPCMetrics rpcMetrics = routerContext.getRouter()
         .getRpcServer().getRPCMetrics();
-    assertEquals(1, rpcMetrics.getProxyOpRetries());
+    assertEquals(0, rpcMetrics.getProxyOpRetries());

Review comment:
       Somehow the unit test without this fix passed in trunk. I had an offline 
discussion with @hdaikoku, but we still don't know the cause. We will create an 
issue if necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 577308)
    Time Spent: 5.5h  (was: 5h 20m)

> RBF: empty blockpool id on dfsrouter caused by UNAVAILABLE NameNode
> -------------------------------------------------------------------
>
>                 Key: HDFS-15900
>                 URL: https://issues.apache.org/jira/browse/HDFS-15900
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rbf
>    Affects Versions: 3.3.0
>            Reporter: Harunobu Daikoku
>            Assignee: Harunobu Daikoku
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.3.1, 3.4.0
>
>         Attachments: image.png
>
>          Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> We observed that when a NameNode becomes UNAVAILABLE, the corresponding 
> blockpool id in MembershipStoreImpl#activeNamespaces on dfsrouter 
> unintentionally sets to empty, its initial value.
>  !image.png|height=250!
> As a result of this, concat operations through dfsrouter fail with the 
> following error as it cannot resolve the block id in the recognized active 
> namespaces.
> {noformat}
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.RemoteException): 
> Cannot locate a nameservice for block pool BP-...
> {noformat}
> A possible fix is to ignore UNAVAILABLE NameNode registrations, and set 
> proper namespace information obtained from available NameNode registrations 
> when constructing the cache of active namespaces.
>  
> [https://github.com/apache/hadoop/blob/rel/release-3.3.0/hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/store/impl/MembershipStoreImpl.java#L207-L221]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to