szetszwo commented on code in PR #9401:
URL: https://github.com/apache/ozone/pull/9401#discussion_r2590190338


##########
hadoop-ozone/integration-test/src/test/java/org/apache/hadoop/ozone/client/rpc/TestContainerStateMachineFailures.java:
##########
@@ -805,6 +814,57 @@ void testContainerStateMachineDualFailureRetry()
     validateData("ratis1", 2, "ratisratisratisratis");
   }
 
+  @Test
+  void testContainerStateMachineAllNodeFailure()
+      throws Exception {
+    // mark all dn volume as full to induce failure
+    List<Pair<StorageVolume, Long>> increasedVolumeSpace = new ArrayList<>();
+    cluster.getHddsDatanodes().forEach(
+        dn -> {
+          List<StorageVolume> volumesList = 
dn.getDatanodeStateMachine().getContainer().getVolumeSet().getVolumesList();
+          volumesList.forEach(sv -> {
+            if (sv.getVolumeUsage().isPresent()) {
+              increasedVolumeSpace.add(Pair.of(sv, 
sv.getCurrentUsage().getAvailable()));
+              
sv.getVolumeUsage().get().incrementUsedSpace(sv.getCurrentUsage().getAvailable());
+            }
+          });
+        }
+    );
+
+    long startTime = Time.monotonicNow();
+    ReplicationConfig replicationConfig = 
ReplicationConfig.fromTypeAndFactor(ReplicationType.RATIS,
+        ReplicationFactor.THREE);
+    try (OzoneOutputStream key = 
objectStore.getVolume(volumeName).getBucket(bucketName).createKey(
+        "testkey1", 1024, replicationConfig, new HashMap<>())) {
+
+      key.write("ratis".getBytes(UTF_8));
+      key.flush();
+      fail();
+    } catch (IOException ex) {
+      assertTrue(ex.getMessage().contains("Retry request failed. retries get 
failed due to exceeded" +
+          " maximum allowed retries number: 5"), ex.getMessage());
+    } finally {
+      increasedVolumeSpace.forEach(e -> e.getLeft().getVolumeUsage().ifPresent(
+          p -> p.decrementUsedSpace(e.getRight())));
+      // test execution is less than 2 sec but to be safe putting 30 sec as 
without fix, taking more than 60 sec
+      assertTrue(Time.monotonicNow() - startTime < 30000, "Operation took 
longer than expected: "
+          + (Time.monotonicNow() - startTime));
+    }
+
+    // previous pipeline gets closed due to disk full failure, so created a 
new pipeline and write should succeed,
+    // and this ensures later test case can pass (should not fail due to 
pipeline unavailability as timeout is 200ms

Review Comment:
   Let's move this to a new test file.  It is hard to debug if individual tests 
are affecting each others.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to