Re: [PR] HDDS-15066. Read-Write Lock race leave stale references to container creating orphan replicas [ozone]

via GitHub Tue, 12 May 2026 22:40:39 -0700


Copilot commented on code in PR #10109:
URL: https://github.com/apache/ozone/pull/10109#discussion_r3231756482



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/statemachine/commandhandler/DeleteBlocksCommandHandler.java:
##########
@@ -303,19 +303,31 @@ public DeleteBlockTransactionExecutionResult call() {
           if (keyValueContainer.
               writeLockTryLock(tryLockTimeoutMs, TimeUnit.MILLISECONDS)) {
             try {
-              String schemaVersion = containerData
-                  .getSupportedSchemaVersionOrDefault();
-              if (getSchemaHandlers().containsKey(schemaVersion)) {
-                schemaHandlers.get(schemaVersion).handle(containerData, tx);
+              // Re-fetch the container after acquiring the lock. DiskBalancer 
may have relocated
+              // this container to a different disk while we waited — in that 
case, the container
+              // object in ContainerSet has changed and containerData points 
to the old replica.
+              Container<?> current = containerSet.getContainer(containerId);
+              if (current == null || current.getContainerData() != 
containerData) {
+                LOG.debug("DeleteBlocks: containerData for container {} is 
stale "
+                    + ", Will retry on the new replica.",
+                    containerId);
+                lockAcquisitionFailed = true;

Review Comment:
   `lockAcquisitionFailed` is now also set when a stale container mapping is 
detected (after the write lock is acquired). `executeCmdWithRetry` treats any 
remaining `lockAcquisitionFailed` after the single retry as a lock-timeout and 
increments `incrTotalLockTimeoutTransactionCount()`, which can misclassify 
“mapping swapped” cases as timeouts. Consider distinguishing retry reasons (eg, 
separate flag/enum for stale-mapping vs tryLock-timeout) so the timeout metric 
stays accurate.
   



##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##########
@@ -279,6 +285,49 @@ public Container<?> getContainer(long containerId) {
     return containerMap.get(containerId);
   }
 
+  /**
+   * Returns the max retry for a container map swap while acquiring container 
lock.
+   * @return max retry count
+   */
+  public static int maxContainerMapSwapRetries() {
+    return MAX_CONTAINER_MAP_SWAP_RETRIES;
+  }
+
+  /**
+   * Locks the container mapped to {@code containerId} for write, and verifies 
that the instance locked is still
+   * the one stored in this set. If the mapping is swapped or the container no 
longer exists, unlocks and retries up to
+   * {@link #maxContainerMapSwapRetries()} times, then returns {@code null}.
+   *
+   * @return the locked container, or {@code null} if none mapped, or mapping 
could not be stabilized
+   */
+  @Nullable
+  public Container<?> acquireContainerLock(long containerId) {
+    for (int retry = 0; retry < MAX_CONTAINER_MAP_SWAP_RETRIES; retry++) {
+      Container<?> candidate = getContainer(containerId);
+      if (candidate == null) {
+        LOG.info("Container {} no longer present in ContainerSet, skipping.", 
containerId);
+        return null;
+      }
+      candidate.writeLock();
+      Container<?> current = getContainer(containerId);
+      if (current == null) {
+        candidate.writeUnlock();
+        LOG.info("Container {} no longer exists in ContainerSet while 
acquiring lock.", containerId);

Review Comment:
   `acquireContainerLock` logs at INFO when a container is missing / removed 
during lock acquisition. This method is likely called from background tasks 
(block deletion, delete commands, etc.), so INFO-level logging here may be 
noisy during normal churn (eg, containers concurrently deleted). Consider 
lowering these messages to DEBUG (or rate-limiting) to avoid log spam in 
production.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-15066. Read-Write Lock race leave stale references to container creating orphan replicas [ozone]

Reply via email to