Arun Sarin created HDDS-15651:
---------------------------------
Summary: [DiskBalancer] markContainerForDelete failure is treated
as a successful move
Key: HDDS-15651
URL: https://issues.apache.org/jira/browse/HDDS-15651
Project: Apache Ozone
Issue Type: Bug
Components: Ozone Datanode
Reporter: Arun Sarin
Attachments: repro_HDDS_markContainerForDelete_BEFORE_fix.log
During a DiskBalancer container move, the datanode copies the container to the
destination volume, updates ContainerSet to point to the new replica, and then
calls markContainerForDelete() on the old source replica.
If markContainerForDelete() fails, DiskBalancer still treats the move as
successful. Success metrics are updated and the old replica may be queued for
delayed deletion, even though the source replica was never properly marked
DELETED. This can leave duplicate replicas on disk and make disk usage and
balancer status misleading.
While reviewing DiskBalancerService.DiskBalancerTask.call(), I noticed that
moveSucceeded is set to true before markContainerForDelete() is called. If mark
fails, the error is only logged and the move is still counted as success.
I added a unit test to reproduce this:
TestDiskBalancerTask#moveSucceedsDespiteMarkContainerForDeleteFailure
The test simulates a markContainerForDelete() failure and checks that the move
should be reported as failed, with no duplicate replica left active on the
destination.
*Steps to reproduce*
1. Run the unit test:
mvn test -pl hadoop-hdds/container-service -am \
-Dtest=TestDiskBalancerTask#moveSucceedsDespiteMarkContainerForDeleteFailure \
-DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
2. On current master (before fix), the test fails with 8 failures (one per
container schema variant). Example:
successCount should be 0 when markContainerForDelete fails
expected: 0 but was: 1
*Expected behavior*
- The move should be counted as a failure (failureCount increases, successCount
stays 0).
- ContainerSet should keep the source replica as the active one.
- Source and destination volume used space should not reflect a completed move.
- Any partially created destination replica should be cleaned up.
*Actual behavior*
- successCount is incremented and successBytes is updated.
- Log message: "Failed to mark the old container <id> for delete. It will be
handled after DN restart."
- ContainerSet points to the new replica on the destination volume.
- The old source replica directory still exists on disk.
- The old replica is still queued for delayed deletion.
*Impact*
- Operators see a successful move in DiskBalancer metrics when it did not
fully complete.
- Duplicate replicas can consume extra disk space on the datanode.
- Source volume may stay over-utilized and balancing may not progress as
expected until datanode restart.
*Suggested fix*
Only mark the move as successful after markContainerForDelete() succeeds. If
mark fails, roll back the move: restore ContainerSet to the source replica,
revert destination volume used space, and delete the destination replica
directory. Do not update success metrics or queue the old replica for deletion.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]