[GitHub] [ozone] errose28 commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

via GitHub Fri, 30 Jun 2023 12:49:02 -0700


errose28 commented on code in PR #4972:
URL: https://github.com/apache/ozone/pull/4972#discussion_r1248191412



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -1336,6 +1336,7 @@ private void deleteInternal(Container container, boolean 
force)
 
         // Rename container location
         try {
+          container.markContainerForDelete();

Review Comment:
   After the addition of this call, line 1380 at the end of this method can be 
removed. The `ContainerData` object should still be intact throughout this 
method even after it is removed from the `ContainerSet`.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws 
StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));
+      clearPendingPutBlockCache();

Review Comment:
   I don't think this line is necessary since the container should already be 
closed and not accepting put blocks when this is called. I don't think it 
causes any harm having it here though.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/states/endpoint/VersionEndpointTask.java:
##########
@@ -135,4 +145,27 @@ private void checkVolumeSet(MutableVolumeSet volumeSet,
       volumeSet.writeUnlock();
     }
   }
+
+  private void cleanupDeletedContainer(HddsVolume volume) {

Review Comment:
   Lets move the startup cleanup steps somewhere inside `ContainerReader` so 
that they can be done as part of the initial iteration that builds up the 
container set with one thread per volume, instead of doing a second iteration 
of all containers in one thread afterwards. Probably inside 
`ContainerRead#verifyAndFixupContainerData` or 
`KeyValueContainerUtil#parseKVContainerData`.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws 
StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));
+      clearPendingPutBlockCache();
+    } finally {
+      writeUnlock();
+    }
+    LOG.warn("Moving container {} to state {} from state:{}",
+        containerData.getContainerPath(), containerData.getState(),
+        prevState);

Review Comment:
   This should probably be an info log.



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java:
##########
@@ -365,6 +365,22 @@ public void markContainerUnhealthy() throws 
StorageContainerException {
             prevState);
   }
 
+  @Override
+  public void markContainerForDelete() throws StorageContainerException {
+    writeLock();
+    ContainerDataProto.State prevState = containerData.getState();
+    try {
+      updateContainerData(() ->
+          containerData.setState(ContainerDataProto.State.DELETED));

Review Comment:
   I think we should change `updateContainerData` to not roll back the 
container state if the update of the container file fails, similar to what it 
does for the unhealthy state. If the file update fails we can continue with the 
rest of the delete process, otherwise if we have a container with a 
corrupt/missing container file we will never be able to delete it.
   
   This could leave orphan entries in Volume RocksDB only if:
   1. Container file update fails
   2. RocksDB update also fails or datanode crashes between these two steps.
   These orphan entries should not be harmful and this is a rare case so I 
think this is ok.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [ozone] errose28 commented on a diff in pull request #4972: HDDS-8770. Cleanup of failed container delete may remove datanode RocksDB entries of active container

Reply via email to