Gargi-jais11 opened a new pull request, #10489:
URL: https://github.com/apache/ozone/pull/10489
## What changes were proposed in this pull request?
In` DiskBalancerService.java` at **lines 642-643** :
`pendingDeletionContainers.put(clock.millis() + replicaDeletionDelay,
container);`
After a successful container move, the old replica is queued for deletion in
**pendingDeletionContainers**, keyed by `clock.millis() +
replicaDeletionDelay`. That key has only millisecond precision, so if two moves
finish in the same millisecond, they get the same key.
Because the map stores one container per key, the second `put() `overwrites
the first. The overwritten container is never scheduled for deletion, so its
old replica stays on disk and wastes space. With `parallelThread > 1`, this is
realistic under normal load.
The key is: **clock.millis() + replicaDeletionDelay**
Both parts are the same for every thread finishing at the same millisecond:
1. **clock.millis()**— wall clock, millisecond resolution. All JVM threads
share the same clock.
2. **replicaDeletionDelay**— a single constant (default 5 minutes = 300,000
ms) shared by the whole service.
**Step-by-step analysis**
Assume **replicaDeletionDelay = 300,000 ms** and **parallelThread = 5**.
Five container moves run in parallel. Moves for containers C-101 and C-202
both finish at clock.millis() = 1,000,000:
```
Thread-1 (moving C-101):
key = 1,000,000 + 300,000 = 1,300,000
pendingDeletionContainers.put(1_300_000, C-101_old_replica)
Map now: { 1_300_000 → C-101_old }
Thread-2 (moving C-202), same millisecond:
key = 1,000,000 + 300,000 = 1,300,000 <------ identical key!
pendingDeletionContainers.put(1_300_000, C-202_old_replica)
Map now: { 1_300_000 → C-202_old } <--------- C-101_old silently GONE
C-101's old replica has been permanently lost from the pending-deletion
queue. It will never be scheduled for deletion. * C-202's old replica gets
deleted correctly.
```
- C-101's old replica is never visited. It sits on the source disk forever.
- The container is marked DELETED in metadata (line 605), so it won't be
served.
- But its data directory(chunks,container.db,.containerdescriptor) remains
on the source disk.
- decrementUsedSpaceis only called insidedeleteContainer(), so the source
volume'sused-space counter is never corrected.
## What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-15524
## How was this patch tested?
Added unit test in `TestDiskBalancerTask`:
**Before fix:** If two container moves completed at same time it has same
key so one gets overridden and it left on the source volume until detected by
RM to be deleted, leaving space unclaimed on source volume.
Added test failed with below error:
```
// With the bug: map size is 1; with fix: 2 queued replicas.
org.opentest4j.AssertionFailedError:
Expected :2
Actual :1
<Click to see difference>
at
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at
org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at
org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
at org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:531)
at
org.apache.hadoop.ozone.container.diskbalancer.TestDiskBalancerTask.testPendingDeletionDoesNotDropReplicasOnSameMillisecondKey(TestDiskBalancerTask.java:739)
at java.lang.reflect.Method.invoke(Method.java:498)
...
```
**After fix :** both the containers completing move at the same time is
correctly deleted from source volume. Test case passed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]