hevinhsu commented on code in PR #9855:
URL: https://github.com/apache/ozone/pull/9855#discussion_r2893903178
##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/KeyValueHandler.java:
##########
@@ -2012,6 +2021,72 @@ private boolean previousChunkPresent(BlockID blockID,
long chunkOffset,
}
}
+ /**
+ * Called during reconciliation to delete block data and chunk files for a
block that a peer has already deleted.
+ * This handles the case where some replicas miss block delete transactions
from SCM.
+ * If the block metadata exists in RocksDB, chunk files are deleted, the
block key is removed from the DB,
+ * and container stats are updated. If the block metadata does not exist in
RocksDB (already deleted or never
+ * existed), falls back to deleteUnreferenced to clean up any orphaned chunk
files.
+ */
+ @VisibleForTesting
+ void deleteBlockForReconciliation(KeyValueContainer container, long
localBlockID) throws IOException {
+ KeyValueContainerData containerData = container.getContainerData();
+ long containerID = containerData.getContainerID();
+
+ container.writeLock();
+ try (DBHandle db = BlockUtils.getDB(containerData, conf)) {
+ String blockKey = containerData.getBlockKey(localBlockID);
+ BlockData blockData = db.getStore().getBlockDataTable().get(blockKey);
+
+ if (blockData == null) {
+ // Block metadata not in DB, but chunk files may still be on disk.
+ LOG.debug("Block {} not found in DB for container {}. Attempting to
clean up unreferenced chunk files.",
+ localBlockID, containerID);
+ try {
+ deleteUnreferenced(container, localBlockID);
+ } catch (IOException e) {
+ LOG.warn("Failed to delete unreferenced files for block {} of
container {}",
+ localBlockID, containerID, e);
+ }
+ return;
+ }
+
+ // Delete chunk files from disk.
+ deleteBlock(container, blockData);
+ long releasedBytes =
KeyValueContainerUtil.getBlockLengthTryCatch(blockData);
+
+ // Remove block metadata from DB and update counters.
+ try (BatchOperation batch =
db.getStore().getBatchHandler().initBatchOperation()) {
+ db.getStore().getBlockDataTable().deleteWithBatch(batch, blockKey);
+ // Also remove from lastChunkInfoTable for schema V2/V3.
Review Comment:
Thanks for the question. There is no explicit rollback mechanism here. The
recovery relies on retry and idempotent deletion, which is the same approach
used by
[`BlockDeletingTask`](https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/statemachine/background/BlockDeletingTask.java).
Since the question is about failures *after* the block is physically deleted
(L2055), here are the two cases:
1. **DB batch commit fails (L2059-2072):**
The chunk file is gone but DB metadata still references it.
The caller (`reconcileContainerInternal` L1756-1761) catches the
`IOException` and continues. Since the block metadata still exists in DB, the
next reconciliation will detect the divergence again and retry
`deleteBlockForReconciliation`, and the batch commit will succeed.
This is the same trade-off `BlockDeletingTask` makes — see the TODO at
`deleteTransactions`
[L470-473](https://github.com/apache/ozone/blob/master/hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/keyvalue/statemachine/background/BlockDeletingTask.java#L469-L473)
acknowledging this gap.
2. **In-memory stats update fails (L2075-2077):**
These operations only update in-memory counters (`decDeletion`,
`decrementUsedSpace`) and do not throw `IOException`.
Even if a failure occurs (e.g. process crash or OOM), the DB state is
already correct, and the in-memory statistics are rebuilt from RocksDB on DN
restart.
So the behavior is consistent with the eventual-consistency model already
used by `BlockDeletingTask`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]