[ 
https://issues.apache.org/jira/browse/HDFS-16283?focusedWorklogId=787243&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-787243
 ]

ASF GitHub Bot logged work on HDFS-16283:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Jul/22 23:08
            Start Date: 01/Jul/22 23:08
    Worklog Time Spent: 10m 
      Work Description: ayushtkn commented on code in PR #4524:
URL: https://github.com/apache/hadoop/pull/4524#discussion_r912279636


##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##########
@@ -759,14 +764,47 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
     }
   }
 
+  /**
+   * Try to get a list of FederationNamespaceInfo for renewLease RPC.
+   */
+  private List<FederationNamespaceInfo> getRewLeaseNSs(String nsIdentifies)
+      throws IOException {
+    // return all namespaces
+    if (nsIdentifies == null || nsIdentifies.isEmpty()) {
+      return new ArrayList<>(namenodeResolver.getNamespaces());
+    }
+    String[] nsIdList = nsIdentifies.split(",");
+    List<FederationNamespaceInfo> result = new ArrayList<>();
+    for (String nsId : nsIdList) {
+      FederationNamespaceInfo namespaceInfo = nsNameSpaceInfoCache.get(nsId);
+      if (namespaceInfo == null) {
+        try {
+          rpcClient.getNamenodesForNameservice(nsId);

Review Comment:
   What are you doing with the return value?



##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/NameNodeRpcServer.java:
##########
@@ -1174,8 +1174,10 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
   }
 
   @Override // ClientProtocol
-  public void renewLease(String clientName) throws IOException {
+  public void renewLease(String clientName, String nsIdentifies)
+      throws IOException {
     checkNNStartup();
+    // just ignore nsIdentifies

Review Comment:
   Better to have a check that it is ``null``, from accidentally letting user 
pass some value to Namenode and feel it is getting honoured.  



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##########
@@ -759,14 +764,47 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
     }
   }
 
+  /**
+   * Try to get a list of FederationNamespaceInfo for renewLease RPC.
+   */
+  private List<FederationNamespaceInfo> getRewLeaseNSs(String nsIdentifies)
+      throws IOException {
+    // return all namespaces
+    if (nsIdentifies == null || nsIdentifies.isEmpty()) {
+      return new ArrayList<>(namenodeResolver.getNamespaces());
+    }
+    String[] nsIdList = nsIdentifies.split(",");
+    List<FederationNamespaceInfo> result = new ArrayList<>();
+    for (String nsId : nsIdList) {
+      FederationNamespaceInfo namespaceInfo = nsNameSpaceInfoCache.get(nsId);
+      if (namespaceInfo == null) {
+        try {
+          rpcClient.getNamenodesForNameservice(nsId);
+        } catch (IOException ioe) {
+          // return all namespaces when parsing nsId failed.
+          return new ArrayList<>(namenodeResolver.getNamespaces());
+        }
+        namespaceInfo = new FederationNamespaceInfo("", "", nsId);
+        nsNameSpaceInfoCache.put(nsId, namespaceInfo);

Review Comment:
   I didn't catch this logic of new ``FederationNamespaceInfor`` creation, you 
have a cached Map, which is empty. You do a get, it will return ``null``, you 
come to the if block and create explicitly this, why aren't we initialising the 
cached map from ``namenodeResolver.getNamespaces()`` or in case we don't find 
it in the cached map, why don't we go ahead and try find from 
``namenodeResolver.getNamespaces()`` 



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:
##########
@@ -579,6 +581,28 @@ void updateLastLeaseRenewal() {
     }
   }
 
+  /**
+   * Get all nsIdentifies of DFSOutputStreams.

Review Comment:
   Identifies in the method names and arguments, doesn't make sense, Can we 
change it to ``NsIndentifiers``, well I am good with just 
``namespaces/namespace`` also



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSClient.java:
##########
@@ -579,6 +581,28 @@ void updateLastLeaseRenewal() {
     }
   }
 
+  /**
+   * Get all nsIdentifies of DFSOutputStreams.
+   */
+  private String getRenewLeaseNSIdentifies() {
+    HashSet<String> allNSIdentifies = new HashSet<>();
+    synchronized (filesBeingWritten) {
+      if (filesBeingWritten.isEmpty()) {
+        return null;
+      }
+      for (DFSOutputStream outputStream : filesBeingWritten.values()) {
+        String nsIdentify = outputStream.getNsIdentify();
+        if (nsIdentify != null && !nsIdentify.isEmpty()) {
+          allNSIdentifies.add(nsIdentify);

Review Comment:
   In which case it can be ``null`` or ``empty``?
   
   One which I can think of is if the router is at older version than the 
client, means if Router doesn't have this and client is upgraded.
   
   I think that scenario should be sorted, if either of the identifier is 
``null`` or ``empty`` pass some ``null`` or so to the Router and make sure the 
old functionality of shooting RPC to all namespaces, stays intact.



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java:
##########
@@ -1450,6 +1452,43 @@ public void testProxyRestoreFailedStorage() throws 
Exception {
     assertEquals(nnSuccess, routerSuccess);
   }
 
+  @Test
+  public void testRewnewLease() throws Exception {
+    // Install a mount point to a different path to check
+    MockResolver resolver =
+        (MockResolver)router.getRouter().getSubclusterResolver();
+    String ns0 = cluster.getNameservices().get(0);
+    String ns1 = cluster.getNameservices().get(1);
+    resolver.addLocation("/testRenewLease0", ns0, "/testRenewLease0");
+    resolver.addLocation("/testRenewLease1", ns1, "/testRenewLease1");
+
+    // Stop LeaseRenewer
+    DistributedFileSystem dfsRouterFS = (DistributedFileSystem) routerFS;
+    dfsRouterFS.getClient().getLeaseRenewer().interruptAndJoin();
+
+    Path testPath = new Path("/testRenewLease0/test.txt");
+    FSDataOutputStream fsDataOutputStream = routerFS.create(testPath);

Review Comment:
   Test both for both replicated as well as Erasure Coded files



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##########
@@ -759,14 +764,47 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
     }
   }
 
+  /**
+   * Try to get a list of FederationNamespaceInfo for renewLease RPC.
+   */
+  private List<FederationNamespaceInfo> getRewLeaseNSs(String nsIdentifies)
+      throws IOException {
+    // return all namespaces
+    if (nsIdentifies == null || nsIdentifies.isEmpty()) {
+      return new ArrayList<>(namenodeResolver.getNamespaces());
+    }
+    String[] nsIdList = nsIdentifies.split(",");

Review Comment:
   First at client we are doing a String.Joinner stuff, then here we are 
splitting, can't we pass an array/set/list whichever possible and get rid of 
this join & split overhead during the call?



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##########
@@ -759,14 +764,47 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
     }
   }
 
+  /**
+   * Try to get a list of FederationNamespaceInfo for renewLease RPC.
+   */
+  private List<FederationNamespaceInfo> getRewLeaseNSs(String nsIdentifies)
+      throws IOException {
+    // return all namespaces
+    if (nsIdentifies == null || nsIdentifies.isEmpty()) {
+      return new ArrayList<>(namenodeResolver.getNamespaces());
+    }
+    String[] nsIdList = nsIdentifies.split(",");
+    List<FederationNamespaceInfo> result = new ArrayList<>();
+    for (String nsId : nsIdList) {
+      FederationNamespaceInfo namespaceInfo = nsNameSpaceInfoCache.get(nsId);
+      if (namespaceInfo == null) {
+        try {
+          rpcClient.getNamenodesForNameservice(nsId);
+        } catch (IOException ioe) {
+          // return all namespaces when parsing nsId failed.
+          return new ArrayList<>(namenodeResolver.getNamespaces());
+        }
+        namespaceInfo = new FederationNamespaceInfo("", "", nsId);
+        nsNameSpaceInfoCache.put(nsId, namespaceInfo);
+      }
+      result.add(namespaceInfo);
+    }
+    return result;
+  }
+
   @Override
-  public void renewLease(String clientName) throws IOException {
+  public void renewLease(String clientName, String nsIdentifies)
+      throws IOException {
     rpcServer.checkOperation(NameNode.OperationCategory.WRITE);
 
     RemoteMethod method = new RemoteMethod("renewLease",
-        new Class<?>[] {String.class}, clientName);
-    Set<FederationNamespaceInfo> nss = namenodeResolver.getNamespaces();
-    rpcClient.invokeConcurrent(nss, method, false, false);
+        new Class<?>[] {String.class, String.class}, clientName, null);
+    List<FederationNamespaceInfo> nss = getRewLeaseNSs(nsIdentifies);
+    if (nss.size() == 1) {
+      rpcClient.invokeSingle(nss.get(0).getNameserviceId(), method);

Review Comment:
   nsId is getting passed from the client, if we get an array or so, you can 
figure out initially itself whether you have only one entry or not. so you can 
get rid of ``getRewLeaseNSs(nsIdentifies);`` completely in that case?



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##########
@@ -759,14 +764,47 @@ public boolean mkdirs(String src, FsPermission masked, 
boolean createParent)
     }
   }
 
+  /**
+   * Try to get a list of FederationNamespaceInfo for renewLease RPC.
+   */
+  private List<FederationNamespaceInfo> getRewLeaseNSs(String nsIdentifies)
+      throws IOException {
+    // return all namespaces
+    if (nsIdentifies == null || nsIdentifies.isEmpty()) {
+      return new ArrayList<>(namenodeResolver.getNamespaces());
+    }
+    String[] nsIdList = nsIdentifies.split(",");
+    List<FederationNamespaceInfo> result = new ArrayList<>();
+    for (String nsId : nsIdList) {
+      FederationNamespaceInfo namespaceInfo = nsNameSpaceInfoCache.get(nsId);
+      if (namespaceInfo == null) {
+        try {
+          rpcClient.getNamenodesForNameservice(nsId);
+        } catch (IOException ioe) {
+          // return all namespaces when parsing nsId failed.
+          return new ArrayList<>(namenodeResolver.getNamespaces());

Review Comment:
   Have some log line here,=



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java:
##########
@@ -1450,6 +1452,43 @@ public void testProxyRestoreFailedStorage() throws 
Exception {
     assertEquals(nnSuccess, routerSuccess);
   }
 
+  @Test
+  public void testRewnewLease() throws Exception {
+    // Install a mount point to a different path to check
+    MockResolver resolver =
+        (MockResolver)router.getRouter().getSubclusterResolver();
+    String ns0 = cluster.getNameservices().get(0);
+    String ns1 = cluster.getNameservices().get(1);
+    resolver.addLocation("/testRenewLease0", ns0, "/testRenewLease0");
+    resolver.addLocation("/testRenewLease1", ns1, "/testRenewLease1");
+
+    // Stop LeaseRenewer
+    DistributedFileSystem dfsRouterFS = (DistributedFileSystem) routerFS;
+    dfsRouterFS.getClient().getLeaseRenewer().interruptAndJoin();
+
+    Path testPath = new Path("/testRenewLease0/test.txt");
+    FSDataOutputStream fsDataOutputStream = routerFS.create(testPath);
+
+    FederationRPCMetrics rpcMetrics = 
router.getRouterRpcServer().getRPCMetrics();
+    long proxyOpBeforeRenewLease = rpcMetrics.getProxyOps();
+    assertTrue(dfsRouterFS.getClient().renewLease());
+    long proxyOpAfterRenewLease = rpcMetrics.getProxyOps();
+    assertEquals((proxyOpBeforeRenewLease + 1), proxyOpAfterRenewLease);
+    fsDataOutputStream.close();
+
+    Path newTestPath0 = new Path("/testRenewLease0/test1.txt");
+    Path newTestPath1 = new Path("/testRenewLease1/test1.txt");
+    FSDataOutputStream fsDataOutputStream0 = routerFS.create(newTestPath0);
+    FSDataOutputStream fsDataOutputStream1 = routerFS.create(newTestPath1);

Review Comment:
   does this code bother ``Append`` flow as well?



##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java:
##########
@@ -1450,6 +1452,43 @@ public void testProxyRestoreFailedStorage() throws 
Exception {
     assertEquals(nnSuccess, routerSuccess);
   }
 
+  @Test
+  public void testRewnewLease() throws Exception {
+    // Install a mount point to a different path to check
+    MockResolver resolver =
+        (MockResolver)router.getRouter().getSubclusterResolver();
+    String ns0 = cluster.getNameservices().get(0);
+    String ns1 = cluster.getNameservices().get(1);
+    resolver.addLocation("/testRenewLease0", ns0, "/testRenewLease0");
+    resolver.addLocation("/testRenewLease1", ns1, "/testRenewLease1");
+
+    // Stop LeaseRenewer
+    DistributedFileSystem dfsRouterFS = (DistributedFileSystem) routerFS;
+    dfsRouterFS.getClient().getLeaseRenewer().interruptAndJoin();
+
+    Path testPath = new Path("/testRenewLease0/test.txt");
+    FSDataOutputStream fsDataOutputStream = routerFS.create(testPath);
+
+    FederationRPCMetrics rpcMetrics = 
router.getRouterRpcServer().getRPCMetrics();
+    long proxyOpBeforeRenewLease = rpcMetrics.getProxyOps();
+    assertTrue(dfsRouterFS.getClient().renewLease());
+    long proxyOpAfterRenewLease = rpcMetrics.getProxyOps();
+    assertEquals((proxyOpBeforeRenewLease + 1), proxyOpAfterRenewLease);
+    fsDataOutputStream.close();
+
+    Path newTestPath0 = new Path("/testRenewLease0/test1.txt");
+    Path newTestPath1 = new Path("/testRenewLease1/test1.txt");
+    FSDataOutputStream fsDataOutputStream0 = routerFS.create(newTestPath0);
+    FSDataOutputStream fsDataOutputStream1 = routerFS.create(newTestPath1);
+
+    long proxyOpBeforeRenewLease2 = rpcMetrics.getProxyOps();
+    assertTrue(dfsRouterFS.getClient().renewLease());
+    long proxyOpAfterRenewLease2 = rpcMetrics.getProxyOps();
+    assertEquals((proxyOpBeforeRenewLease2 + 2), proxyOpAfterRenewLease2);
+    fsDataOutputStream0.close();
+    fsDataOutputStream1.close();

Review Comment:
   Either use finally or try-with resources, for close.



##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java:
##########
@@ -763,7 +763,7 @@ SnapshotStatus[] getSnapshotListing(String snapshotRoot)
    * @throws IOException If an I/O error occurred
    */
   @Idempotent
-  void renewLease(String clientName) throws IOException;
+  void renewLease(String clientName, String allNSIdentifies) throws 
IOException;

Review Comment:
   Add detail about the new argument in the javadoc as well





Issue Time Tracking
-------------------

    Worklog Id:     (was: 787243)
    Time Spent: 2h 40m  (was: 2.5h)

> RBF: improve renewLease() to call only a specific NameNode rather than make 
> fan-out calls
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-16283
>                 URL: https://issues.apache.org/jira/browse/HDFS-16283
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: rbf
>            Reporter: Aihua Xu
>            Assignee: Aihua Xu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: RBF_ improve renewLease() to call only a specific 
> NameNode rather than make fan-out calls.pdf
>
>          Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Currently renewLease() against a router will make fan-out to all the 
> NameNodes. Since renewLease() call is so frequent and if one of the NameNodes 
> are slow, then eventually the router queues are blocked by all renewLease() 
> and cause router degradation. 
> We will make a change in the client side to keep track of NameNode Id in 
> additional to current fileId so routers understand which NameNodes the client 
> is renewing lease against.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to