arunsarin85 opened a new pull request, #10593:
URL: https://github.com/apache/ozone/pull/10593

   ## What changes were proposed in this pull request?
   When markContainerForDelete() fails after a container has been copied to the 
destination volume, treat the move as a failure instead of success.
   Restore ContainerSet to the source replica, revert destination volume 
accounting, delete the destination replica directory, and do not queue the 
source replica for delayed deletion. Add a regression test.
   
   Please describe your PR in detail:
   Bug: DiskBalancer reported a successful move even when 
markContainerForDelete() failed on the source replica.
   Fix: On mark failure, the move is rolled back and counted as a failure.
   
   <google-sheets-html-origin><style type="text/css"><!--td {border: 1px solid 
#cccccc;}br {mso-data-placement:same-cell;}--></style>
   Before (bug) | After (fix)
   -- | --
   moveSucceeded = true set before calling markContainerForDelete() | 
moveSucceeded = true only after mark succeeds
   Success metrics updated regardless of mark outcome | Success metrics updated 
only on full success
   ContainerSet kept pointing at destination replica | ContainerSet restored to 
source replica
   Destination volume used space left incremented | Destination used space 
decremented
   Destination replica directory left on disk | Destination replica directory 
deleted
   Source replica queued for delayed deletion | Source replica not queued
   Log: "It will be handled after DN restart" | Log: "Rolling back move"
   
   
   ## What is the link to the Apache JIRA
   
   https://issues.apache.org/jira/browse/HDDS-15651
   
   ## How was this patch tested?
   
   Regression test
   TestDiskBalancerTask.moveSucceedsDespiteMarkContainerForDeleteFailure 
(HDDS-15651):
   
   Creates a CLOSED container on the source volume
   Sets replicaDeletionDelay = 60_000 ms so delayed deletion does not hide 
duplicate-replica bugs
   Look for on the source KeyValueContainer and makes markContainerForDelete() 
throw
   Runs DiskBalancerTask.call()
   
   
[repro_HDDS_markContainerForDelete_BEFORE_fix.log](https://github.com/user-attachments/files/29266518/repro_HDDS_markContainerForDelete_BEFORE_fix.log)
   
   
[repro_HDDS_markContainerForDelete_AFTER_fix.log](https://github.com/user-attachments/files/29266552/repro_HDDS_markContainerForDelete_AFTER_fix.log)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to