Gargi-jais11 commented on code in PR #8932: URL: https://github.com/apache/ozone/pull/8932#discussion_r2306123820
########## hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java: ########## @@ -897,6 +901,84 @@ public DataScanResult scanData(DataTransferThrottler throttler, Canceler cancele return checker.fullCheck(throttler, canceler); } + @Override + public void copyContainerDirectory(Path destination) throws IOException { + readLock(); + try { + // Closed/ Quasi closed containers are considered for replication by + // replication manager if they are under-replicated. + ContainerProtos.ContainerDataProto.State state = + getContainerData().getState(); + if (!(state == ContainerProtos.ContainerDataProto.State.CLOSED || + state == ContainerProtos.ContainerDataProto.State.QUASI_CLOSED)) { + throw new IllegalStateException( + "Only (quasi)closed containers can be exported, but " + + "ContainerId=" + getContainerData().getContainerID() + + " is in state " + state); + } + + if (!containerData.getSchemaVersion().equals(OzoneConsts.SCHEMA_V3)) { + compactDB(); + // Close DB (and remove from cache) to avoid concurrent modification + // while copying it. + BlockUtils.removeDB(containerData, config); + } + + if (containerData.getSchemaVersion().equals(OzoneConsts.SCHEMA_V3)) { + // Synchronize the dump and copy operation, + // so concurrent copy don't get dump files overwritten. + // We seldom got concurrent exports for a container, + // so it should not influence performance much. + synchronized (dumpLock) { + BlockUtils.dumpKVContainerDataToFiles(containerData, config); + copyContainerToDestination(destination); + } + } else { + copyContainerToDestination(destination); Review Comment: In this method, the goal is to copy the container's files to destination volume. This is a **read-only** operation from the perspective of the container's data and state. The code needs to guarantee that another thread doesn't, for example, start deleting the container while it's being copied. Acquiring a **readLock()** perfectly achieves this. It prevents any other thread from getting a **writeLock()** to modify the container, while still allowing other threads to perform concurrent read operations if needed. Using a **writeLock()** here would be too restrictive and would unnecessarily block other threads from reading the container. > if multiple thread does same thing. https://github.com/apache/ozone/blob/00ca998a966d42487f816393a59db6d20a7ac387/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueContainer.java#L932-L938 The **synchronized** block ensures that even if two threads enter the method concurrently, the process of dumping the RocksDB data to temporary files (`dumpKVContainerDataToFiles`) is serialized. This prevents the two threads from overwriting each other's temporary dump files, which would lead to corruption. Once the dump is complete and the files are copied, the lock is released, and the next thread can proceed safely. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@ozone.apache.org For additional commands, e-mail: issues-h...@ozone.apache.org