kerneltime opened a new pull request, #10496: URL: https://github.com/apache/ozone/pull/10496
## What changes were proposed in this pull request? When a closed container is reconciled against a peer, `KeyValueHandler.reconcileChunksPerBlock` pulls chunks from the peer to repair the local replica. The block commit sequence ID (BCSID) is a high-water mark: it is meant to assert that the replica holds, durably, every committed chunk up to that sequence id. The method's contract states the BCSID is advanced to the peer's value only when the entire block is read and written successfully. The per-chunk loop can stop early in two ways: 1. A chunk read or write throws `IOException`. This path sets the loop's success flag to `false`, and the block is committed with `overwriteBscId == false`, so the BCSID is left alone. Correct. 2. The loop reaches a chunk whose preceding chunk is missing locally (a hole). Writing it would leave a gap in the block file, which is not supported, so the loop breaks. This path did **not** clear the success flag. Because the success flag is initialized to `true` and the hole path left it untouched, the block was committed as if fully repaired: the BCSID was overwritten with the peer's higher value even though the chunks past the hole were never ingested, and the container BCSID was advanced to match. The replica then advertised a sequence id whose backing data it does not actually hold. This is reachable in normal operation, not only under corruption. The container scanner deliberately omits a missing chunk from the merkle tree it builds (`KeyValueContainerCheck`: "Missing chunks should not be added to the merkle tree."), so a healthy peer that is itself missing one chunk in the middle of a block advertises exactly such a gapped list. SCM treats the reported BCSID as a freshness/completeness signal (`AbstractContainerReportHandler` raises the container's recorded sequence id to a healthy replica's value, and the force-close and delete-versus-resurrect paths compare against it), so an inflated BCSID can cause a holed replica missing committed data to be treated as the most up-to-date copy, and a complete replica to be deleted as redundant. The fix treats the hole exit as a partial result, exactly like the `IOException` exit: it clears the success flag before breaking out of the loop. Chunks ingested before the hole are still committed, but the BCSID is not advanced while the block remains incomplete. The post-reconciliation scanner rebuilds the merkle tree from what is on disk, and a later reconciliation round that first fills the missing chunk can then advance the BCSID legitimately. To make the path testable, `reconcileChunksPerBlock` is now package-private (`@VisibleForTesting`) and the block input stream factory can be substituted by a test. ## What is the link to the Apache JIRA https://issues.apache.org/jira/browse/HDDS-15542 ## How was this patch tested? Added `TestReconcileChunksPerBlockHoleBcsId`, a single-threaded unit test that exercises the real `reconcileChunksPerBlock` against a real closed container holding only the offset-0 chunk (BCSID 1), with a mocked peer that advertises BCSID 99 and a chunk merkle list `{CHUNK_LEN, 3*CHUNK_LEN}` (the chunk at `2*CHUNK_LEN` is omitted, so `3*CHUNK_LEN` sits past a hole). The test asserts that, because a hole remains, the block and container BCSID are not advanced to the peer value. The test fails on the unmodified code (the BCSID is advanced to 99) and passes with this change (it stays at 1), so it doubles as the regression guard. Run locally: ``` mvn -pl hadoop-hdds/container-service test -Dtest='TestReconcileChunksPerBlockHoleBcsId' ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
