[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

ASF GitHub Bot (Jira) Sat, 18 Dec 2021 04:31:07 -0800


     [ 
https://issues.apache.org/jira/browse/HDFS-16382?focusedWorklogId=698237&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-698237
 ]


ASF GitHub Bot logged work on HDFS-16382:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 18/Dec/21 12:30
            Start Date: 18/Dec/21 12:30
    Worklog Time Spent: 10m 
      Work Description: taiyang-li commented on a change in pull request #3797:
URL: https://github.com/apache/hadoop/pull/3797#discussion_r771820322



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java
##########
@@ -1204,16 +1207,38 @@ public void setBalancerBandwidth(long bandwidth) throws 
IOException {
 
   @Override
   public ContentSummary getContentSummary(String path) throws IOException {
+    return getContentSummary(path, new HashMap<String, Set<String>>());
+  }
+  
+  public ContentSummary getContentSummary(String path, Map<String, 
Set<String>> excludeNamespace) throws IOException {
     rpcServer.checkOperation(NameNode.OperationCategory.READ);
 
     // Get the summaries from regular files
     final Collection<ContentSummary> summaries = new ArrayList<>();
     final List<RemoteLocation> locations =
         rpcServer.getLocationsForPath(path, false, false);
+    Map<String, Set<String>> currentExcludePathNsMap = new HashMap<>();
+    Set<String> curExcludeNamespace = new HashSet<>();
+    String destPath = 
subclusterResolver.getDestinationForPath(path).getDefaultLocation().getDest();
+    List<String> parentExistLocations = 
excludeNamespace.keySet().stream().filter(s -> destPath.startsWith(s))
+        .collect(Collectors.toList());
+    boolean parentAlreadyComputed = parentExistLocations.size() > 0;
+    List<RemoteLocation> filterLoctions =
+        locations.stream().filter(remoteLocation -> excludeNamespace.isEmpty() 
|| !parentAlreadyComputed ||
+            !isParentPathNamespaceComputed(remoteLocation, excludeNamespace, 
parentExistLocations))
+            .collect(Collectors.toList());
+    filterLoctions.forEach(remoteLocation -> {
+      curExcludeNamespace.add(remoteLocation.getNameserviceId());
+    });
+    if (excludeNamespace.get(destPath) != null) {
+      excludeNamespace.get(destPath).addAll(curExcludeNamespace);
+    } else {
+      excludeNamespace.put(destPath, curExcludeNamespace);
+    }
     final RemoteMethod method = new RemoteMethod("getContentSummary",
         new Class<?>[] {String.class}, new RemoteParam());
     final List<RemoteResult<RemoteLocation, ContentSummary>> results =
-        rpcClient.invokeConcurrent(locations, method,
+        rpcClient.invokeConcurrent(filterLoctions, method,

Review comment:
       Find  wrong spelling of  `filterLocations`.. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 698237)
    Time Spent: 0.5h  (was: 20m)

> RBF: getContentSummary RPC compute sub-directory repeatedly
> -----------------------------------------------------------
>
>                 Key: HDFS-16382
>                 URL: https://issues.apache.org/jira/browse/HDFS-16382
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rbf
>    Affects Versions: 3.3.1
>            Reporter: zhanghaobo
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Router getContentSummary rpc compute sub-directory repeatedly when a 
> direactory and its ancestor directory are both mounted  in the form of 
> original src path.
> For example, suppose we have mount table entries below:
> /A---ns1---/A
> /A/B—ns1,ns2—/A/B
> we put a file test.txt to directory /A/B in namepsace ns1, then execute `hdfs 
> dfs -count  hdfs://router:8888/A`,  the result is wrong, because we compute 
> /A/B/test.txt repeatedly



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Work logged] (HDFS-16382) RBF: getContentSummary RPC compute sub-directory repeatedly

Reply via email to