[GitHub] [ozone] fapifta commented on a change in pull request #2767: Hdds 5491: EC: Write should handle node failures.

GitBox Wed, 03 Nov 2021 19:02:55 -0700


fapifta commented on a change in pull request #2767:
URL: https://github.com/apache/ozone/pull/2767#discussion_r741971338




##########
File path: 
hadoop-hdds/client/src/main/java/org/apache/hadoop/hdds/scm/storage/ECBlockOutputStream.java
##########
@@ -35,12 +38,15 @@
 import java.util.concurrent.ExecutionException;
 
 import static 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.putBlockAsync;
+import static 
org.apache.hadoop.hdds.scm.storage.ContainerProtocolCalls.writeChunkAsync;
 
 /**
  * Handles the chunk EC writes for an EC internal block.
  */
 public class ECBlockOutputStream extends BlockOutputStream{
 
+  private CompletableFuture<ContainerProtos.ContainerCommandResponseProto>
+      currentChunkRspFuture = null;

Review comment:
       Ah, if flushes won't be supported, then probably this is not a problem, 
as executePutBlock than will happen after evaluating the future in writes (due 
to the sync nature of how we test for failures). I can live with this, though I 
still wondering if it would be safer to use two references one to store futures 
from wirtes, and one from putBlock, and check for both in the failure check, 
though I do not have any case in mind atm without flush where this would be a 
real problem in the current system, so I am ok with leaving this as it is.

##########
File path: 
hadoop-ozone/client/src/main/java/org/apache/hadoop/ozone/client/io/ECKeyOutputStream.java
##########
@@ -239,7 +293,15 @@ private void handleParityWrites(int parityCellSize,
     // TODO: we should alter the put block calls to share CRC to each stream.
     ECBlockOutputStreamEntry streamEntry =
         blockOutputStreamEntryPool.getCurrentStreamEntry();
+    // Since writes are async, let's check the failures once.
+    if(streamEntry.checkStreamFailures()){

Review comment:
       Yes, that's what I meant as well, but I realized I forgot my point here 
:D I wanted to suggest to note this behaviour in the comment above the if or in 
checkStreamFailures API doc ;)

##########
File path: 
hadoop-ozone/client/src/test/java/org/apache/hadoop/ozone/client/TestOzoneECClient.java
##########
@@ -420,6 +420,80 @@ public void testPartialStripeWithPartialLastChunk()
     }
   }
 
+  @Test
+  public void testWriteShouldFailIfMoreThanParityNodesFail()
+      throws IOException {
+    testNodeFailuresWhileWriting(3, 3);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIfLessThanParityNodesFail()
+      throws IOException {
+    testNodeFailuresWhileWriting(1, 2);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIf4NodesFailed()
+      throws IOException {
+    testNodeFailuresWhileWriting(4, 1);
+  }
+
+  @Test
+  public void testWriteShouldSuccessIfAllNodesFailed()
+      throws IOException {
+    testNodeFailuresWhileWriting(4, 1);
+  }
+
+  public void testNodeFailuresWhileWriting(int numFailureToInject,
+      int numChunksToWriteAfterFailure) throws IOException {
+    store.createVolume(volumeName);
+    OzoneVolume volume = store.getVolume(volumeName);
+    volume.createBucket(bucketName);
+    OzoneBucket bucket = volume.getBucket(bucketName);
+
+    try (OzoneOutputStream out = bucket.createKey(keyName, 1024 * 3,
+        new ECReplicationConfig(3, 2, ECReplicationConfig.EcCodec.RS,
+            chunkSize), new HashMap<>())) {
+      for (int i = 0; i < dataBlocks; i++) {
+        out.write(inputChunks[i]);
+      }
+
+      List<DatanodeDetails> failedDNs = new ArrayList<>();
+      Map<DatanodeDetails, MockDatanodeStorage> storages =
+          ((MockXceiverClientFactory) factoryStub).getStorages();
+      DatanodeDetails[] dnDetails =
+          storages.keySet().toArray(new DatanodeDetails[storages.size()]);
+      for (int i = 0; i < numFailureToInject; i++) {
+        failedDNs.add(dnDetails[i]);
+      }
+
+      // First let's set storage as bad
+      ((MockXceiverClientFactory) factoryStub).setFailedStorages(failedDNs);
+
+      for (int i = 0; i < numChunksToWriteAfterFailure; i++) {
+        out.write(inputChunks[i]);
+      }
+    }
+    final OzoneKeyDetails key = bucket.getKey(keyName);
+    Assert.assertEquals(2, key.getOzoneKeyLocations().size());

Review comment:
       Understood. Won't this be a problem if we write more chunks via this 
general purpose method, and the amount of data written spans into multiple 
blockggroups already, and the failures injected are also adding more block 
groups?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] fapifta commented on a change in pull request #2767: Hdds 5491: EC: Write should handle node failures.

Reply via email to