adoroszlai commented on code in PR #5742:
URL: https://github.com/apache/ozone/pull/5742#discussion_r1418261745


##########
hadoop-hdds/common/src/main/resources/ozone-default.xml:
##########
@@ -817,6 +817,14 @@
       ids will be allocated in a single batch.
     </description>
   </property>
+  <property>
+    <name>ozone.scm.get.container.max.retry</name>
+    <value>4096</value>
+    <tag>OZONE, SCM</tag>
+    <description>
+      This property determines how many times SCM is going to retry 
getContainer operation for Ratis type container.
+    </description>
+  </property>

Review Comment:
   I don't think this should be exposed to users to configure.



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerManagerImpl.java:
##########
@@ -340,7 +340,10 @@ public ContainerInfo getMatchingContainer(final long size, 
final String owner,
       }
     } catch (Exception e) {
       LOG.warn("Container allocation failed on pipeline={}", pipeline, e);
-      return null;
+      return new ContainerInfo.Builder()
+          .setContainerID(-1)

Review Comment:
   -1 is not a valid container ID.
   
   ```
   IllegalStateException: -1 < 0
        at org.apache.ratis.util.Preconditions.assertTrue(Preconditions.java:73)
        at 
org.apache.hadoop.hdds.scm.container.ContainerInfo$Builder.setContainerID(ContainerInfo.java:398)
        at 
org.apache.hadoop.hdds.scm.container.ContainerManagerImpl.getMatchingContainer(ContainerManagerImpl.java:344)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.selectContainer(WritableRatisContainerProvider.java:230)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableRatisContainerProvider.getContainer(WritableRatisContainerProvider.java:109)
        at 
org.apache.hadoop.hdds.scm.pipeline.WritableContainerFactory.getContainer(WritableContainerFactory.java:74)
   ```



##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/WritableRatisContainerProvider.java:
##########
@@ -103,7 +110,13 @@ public ContainerInfo getContainer(final long size,
               excludeList);
         }
         if (containerInfo != null) {
-          return containerInfo;
+          // if containerID == -1, means Container allocation
+          // failed on selected pipeline
+          if (containerInfo.getContainerID() != -1) {
+            return containerInfo;
+          } else {
+            excludeList.addPipeline(containerInfo.getPipelineID());

Review Comment:
   Trying to exclude the faulty pipeline may not work, because:
   
   
https://github.com/apache/ozone/blob/d1a92b94a807808f3c732cd55b9a0d5f25bdff20/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/pipeline/WritableRatisContainerProvider.java#L192-L196



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to