[
https://issues.apache.org/jira/browse/HDDS-15651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Gargi Jaiswal reassigned HDDS-15651:
------------------------------------
Assignee: Gargi Jaiswal
> [DiskBalancer] markContainerForDelete failure is treated as a successful move
> -----------------------------------------------------------------------------
>
> Key: HDDS-15651
> URL: https://issues.apache.org/jira/browse/HDDS-15651
> Project: Apache Ozone
> Issue Type: Bug
> Components: Ozone Datanode
> Reporter: Arun Sarin
> Assignee: Gargi Jaiswal
> Priority: Major
> Labels: pull-request-available
> Attachments: repro_HDDS_markContainerForDelete_BEFORE_fix.log
>
>
> During a DiskBalancer container move, the datanode copies the container to the
> destination volume, updates ContainerSet to point to the new replica, and then
> calls markContainerForDelete() on the old source replica.
>
> If markContainerForDelete() fails, DiskBalancer still treats the move as
> successful. Success metrics are updated and the old replica may be queued for
> delayed deletion, even though the source replica was never properly marked
> DELETED. This can leave duplicate replicas on disk and make disk usage and
> balancer status misleading.
>
> While reviewing DiskBalancerService.DiskBalancerTask.call(), I noticed that
> moveSucceeded is set to true before markContainerForDelete() is called. If
> mark
> fails, the error is only logged and the move is still counted as success.
>
> I added a unit test to reproduce this:
> TestDiskBalancerTask#moveSucceedsDespiteMarkContainerForDeleteFailure
>
> The test simulates a markContainerForDelete() failure and checks that the move
> should be reported as failed, with no duplicate replica left active on the
> destination.
>
> *Steps to reproduce*
> 1. Run the unit test:
>
> mvn test -pl hadoop-hdds/container-service -am \
> -Dtest=TestDiskBalancerTask#moveSucceedsDespiteMarkContainerForDeleteFailure \
> -DfailIfNoTests=false -Dsurefire.failIfNoSpecifiedTests=false
>
> 2. On current master (before fix), the test fails with 8 failures (one per
> container schema variant). Example:
>
> successCount should be 0 when markContainerForDelete fails
> expected: 0 but was: 1
>
> *Expected behavior*
> - The move should be counted as a failure (failureCount increases,
> successCount
> stays 0).
> - ContainerSet should keep the source replica as the active one.
> - Source and destination volume used space should not reflect a completed
> move.
> - Any partially created destination replica should be cleaned up.
>
> *Actual behavior*
> - successCount is incremented and successBytes is updated.
> - Log message: "Failed to mark the old container <id> for delete. It will be
> handled after DN restart."
> - ContainerSet points to the new replica on the destination volume.
> - The old source replica directory still exists on disk.
> - The old replica is still queued for delayed deletion.
>
> *Impact*
> - Operators see a successful move in DiskBalancer metrics when it did not
> fully complete.
> - Duplicate replicas can consume extra disk space on the datanode.
> - Source volume may stay over-utilized and balancing may not progress as
> expected until datanode restart.
>
> *Suggested fix*
> Only mark the move as successful after markContainerForDelete() succeeds. If
> mark fails, roll back the move: restore ContainerSet to the source replica,
> revert destination volume used space, and delete the destination replica
> directory. Do not update success metrics or queue the old replica for
> deletion.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]