Gargi-jais11 commented on code in PR #10109:
URL: https://github.com/apache/ozone/pull/10109#discussion_r3232317054
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java:
##########
@@ -279,6 +285,49 @@ public Container<?> getContainer(long containerId) {
return containerMap.get(containerId);
}
+ /**
+ * Returns the max retry for a container map swap while acquiring container
lock.
+ * @return max retry count
+ */
+ public static int maxContainerMapSwapRetries() {
+ return MAX_CONTAINER_MAP_SWAP_RETRIES;
+ }
+
+ /**
+ * Locks the container mapped to {@code containerId} for write, and verifies
that the instance locked is still
+ * the one stored in this set. If the mapping is swapped or the container no
longer exists, unlocks and retries up to
+ * {@link #maxContainerMapSwapRetries()} times, then returns {@code null}.
+ *
+ * @return the locked container, or {@code null} if none mapped, or mapping
could not be stabilized
+ */
+ @Nullable
+ public Container<?> acquireContainerLock(long containerId) {
+ for (int retry = 0; retry < MAX_CONTAINER_MAP_SWAP_RETRIES; retry++) {
+ Container<?> candidate = getContainer(containerId);
+ if (candidate == null) {
+ LOG.info("Container {} no longer present in ContainerSet, skipping.",
containerId);
+ return null;
+ }
+ candidate.writeLock();
+ Container<?> current = getContainer(containerId);
+ if (current == null) {
+ candidate.writeUnlock();
+ LOG.info("Container {} no longer exists in ContainerSet while
acquiring lock.", containerId);
+ return null;
+ }
+ if (current != candidate) {
+ candidate.writeUnlock();
+ if (LOG.isDebugEnabled()) {
+ LOG.debug("Container {} mapping changed during lock acquisition
(attempt {}); retrying.",
Review Comment:
Sammi, initially we were simply refetching the container and checking with
the existing container if it differs we would fail but then Stephen suggested
its better to refetch the container and retry while acquiring the writeLock so
that we don't rely on RM to resend the same command after this fail.
We hold writeLock on candidate (old object), not on current (new object).
Returning current bare would hand an unlocked container to the caller. We must
first writeUnlock(candidate) then writeLock(current) — but between those two
steps the map can change again, so the loop retries until we lock whatever is
currently live in the map.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]