[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2172: HDDS-5126.Recon should check new containers of a container report wit…

GitBox Mon, 26 Apr 2021 02:56:17 -0700


JacksonYao287 commented on a change in pull request #2172:
URL: https://github.com/apache/ozone/pull/2172#discussion_r619945499




##########
File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -122,26 +122,87 @@ public void checkAndAddNewContainer(ContainerID 
containerID,
           scmClient.getContainerWithPipeline(containerID.getId());
       LOG.debug("Verified new container from SCM {}, {} ",
           containerID, containerWithPipeline.getPipeline().getId());
-      // If no other client added this, go ahead and add this container.
-      if (!containerExist(containerID)) {
-        addNewContainer(containerID.getId(), containerWithPipeline);
-      }
+      // no need call "containerExist" to check, because
+      // 1 containerExist and addNewContainer can not be atomic
+      // 2 addNewContainer will double check the existence
+      addNewContainer(containerWithPipeline);
     } else {
-      // Check if container state is not open. In SCM, container state
-      // changes to CLOSING first, and then the close command is pushed down
-      // to Datanodes. Recon 'learns' this from DN, and hence replica state
-      // will move container state to 'CLOSING'.
-      ContainerInfo containerInfo = getContainer(containerID);
-      if (containerInfo.getState().equals(HddsProtos.LifeCycleState.OPEN)
-          && !replicaState.equals(ContainerReplicaProto.State.OPEN)
-          && isHealthy(replicaState)) {
-        LOG.info("Container {} has state OPEN, but Replica has State {}.",
-            containerID, replicaState);
-        try {
-          updateContainerState(containerID, FINALIZE);
-        } catch (InvalidStateTransitionException e) {
-          throw new IOException(e);
-        }
+      checkContainerStateAndUpdate(containerID, replicaState);
+    }
+  }
+
+  /**
+   * Check and add new containers in batch if not already present in Recon.
+   *
+   * @param containerReplicaProtoList list of containerReplicaProtos.
+   * @throws IOException on Error.
+   */
+  public void checkAndAddNewContainerBatch(
+      List<ContainerReplicaProto> containerReplicaProtoList)
+      throws IOException {
+    Map<Boolean, List<ContainerReplicaProto>> containers =
+        containerReplicaProtoList.parallelStream()
+        .collect(Collectors.groupingBy(c ->
+            containerExist(ContainerID.valueOf(c.getContainerID()))));
+
+    List<ContainerReplicaProto> existContainers = null;
+    if (containers.containsKey(true)) {
+      existContainers = containers.get(true);
+    }
+    List<Long> noExistContainers = null;
+    if (containers.containsKey(false)){
+      noExistContainers = containers.get(false).parallelStream().
+          map(ContainerReplicaProto::getContainerID)
+          .collect(Collectors.toList());
+    }
+
+    //for now , if any one container in noExistContainers is not found by SCM,
+    //an IOException will be throw and the whole noExistContainers will be 
drop.
+    //in some cases，this may slow the process for recon to learn new container,
+    //but it does not matter, just make it simple for the present
+    if (null != noExistContainers) {
+      List<ContainerWithPipeline> verifiedContainerPipeline =
+          scmClient.getContainerWithPipelineBatch(noExistContainers);

Review comment:
       I think a better choice is to rewrite "getContainerWithPipelineBatch" in 
scm , which will just return the list of found containers, and do not throw an 
exception. we would better not take one-by-one rpc to scm，espacially when the 
batchsize is large

##########
File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerReportHandler.java
##########
@@ -48,26 +46,16 @@ public ReconContainerReportHandler(NodeManager nodeManager,
   @Override
   public void onMessage(final ContainerReportFromDatanode reportFromDatanode,
                         final EventPublisher publisher) {
-
-    final ContainerReportsProto containerReport =
-        reportFromDatanode.getReport();
     ReconContainerManager containerManager =
         (ReconContainerManager) getContainerManager();
-
-    List<ContainerReplicaProto> reportsList = containerReport.getReportsList();
-    for (ContainerReplicaProto containerReplicaProto : reportsList) {
-      final ContainerID id = ContainerID.valueOf(
-          containerReplicaProto.getContainerID());
-      try {
-        containerManager.checkAndAddNewContainer(id,
-            containerReplicaProto.getState(),
-            reportFromDatanode.getDatanodeDetails());
-      } catch (IOException ioEx) {
-        LOG.error("Exception while checking and adding new container.", ioEx);
-      }
-      LOG.debug("Got container report for containerID {} ",
-          containerReplicaProto.getContainerID());
+    List<ContainerReplicaProto> containerReplicaProtoList =
+        reportFromDatanode.getReport().getReportsList();
+    try {
+      containerManager.checkAndAddNewContainerBatch(containerReplicaProtoList);

Review comment:
       yes, I think so. but when i wrote this , I can not Determine the 
appropriate batch size , so i just left it here.
   maybe we can get it for benchmark or production environment cluster. what is 
more , we can write some code to adjust the batchsize  dynamically according to 
the network environment, the load of scm , and etc。

##########
File path: 
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconContainerManager.java
##########
@@ -122,26 +122,87 @@ public void checkAndAddNewContainer(ContainerID 
containerID,
           scmClient.getContainerWithPipeline(containerID.getId());
       LOG.debug("Verified new container from SCM {}, {} ",
           containerID, containerWithPipeline.getPipeline().getId());
-      // If no other client added this, go ahead and add this container.
-      if (!containerExist(containerID)) {
-        addNewContainer(containerID.getId(), containerWithPipeline);
-      }
+      // no need call "containerExist" to check, because
+      // 1 containerExist and addNewContainer can not be atomic
+      // 2 addNewContainer will double check the existence

Review comment:
       in my opinion ，"check the existence and add" should be an atomic 
operation( just like CAS operation), and actually this is what 
ContainerStateManagerImpl#addContainer does (as you mentioned that a write lock 
is involved)
   
   ```
   lock.writeLock().lock();
   
   try {
   if (!containers.contains(containerID)) {
    add it;
   }  finally {
   lock.writeLock().unlock();
   }
   ```
   so, if we call "containerExist" before "addNewContainer", it seem like that 
we make a compare operation before an atomic CAS operation, and it seems 
redundant.
   
   
   
   > There are still some steps that need to be taken before `addNewContainer` 
calls `ContainerStateMap$contains`
   
   yes, so the later we call ` ContainerStateMap$contains` , the better result 
we may get, because the container existence maybe changed within the time 
window, in which these steps is executed.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] JacksonYao287 commented on a change in pull request #2172: HDDS-5126.Recon should check new containers of a container report wit…

Reply via email to