ChenSammi commented on code in PR #9505:
URL: https://github.com/apache/ozone/pull/9505#discussion_r2629666258


##########
hadoop-hdds/container-service/src/main/java/org/apache/hadoop/ozone/container/diskbalancer/policy/DefaultVolumeChoosingPolicy.java:
##########
@@ -50,67 +52,43 @@ public DefaultVolumeChoosingPolicy(ReentrantLock 
globalLock) {
 
   @Override
   public Pair<HddsVolume, HddsVolume> chooseVolume(MutableVolumeSet volumeSet,
-      double threshold, Map<HddsVolume, Long> deltaMap, long containerSize) {
+      double thresholdPercentage, Map<HddsVolume, Long> deltaMap, long 
containerSize) {
     lock.lock();
     try {
       // Create truly immutable snapshot of volumes to ensure consistency
-      ImmutableList<HddsVolume> allVolumes = 
DiskBalancerVolumeCalculation.getImmutableVolumeSet(volumeSet);
-
+      final List<StorageVolume> allVolumes = volumeSet.getVolumesList();
       if (allVolumes.size() < 2) {
         return null; // Can't balance with less than 2 volumes.
       }
-      
-      // Calculate ideal usage using the same immutable volume
-      double idealUsage = 
DiskBalancerVolumeCalculation.getIdealUsage(allVolumes, deltaMap);
 
-      // Threshold is given as a percentage
-      double normalizedThreshold = threshold / 100;
-      List<HddsVolume> volumes = allVolumes
-          .stream()
-          .filter(volume -> {
-            SpaceUsageSource usage = volume.getCurrentUsage();

Review Comment:
   @szetszwo,  I got the point after a second thought.  My previous 
understanding of this filtering out < threashold volumes is it tries to 
implement a high efficient way of selecting destVolume, as a straightforward 
thinking is, if there is one volume beyond the utilization threshold, there is 
likely one volume below the utilization threshold, but realized that actually 
there are other cases, that there is one volume beyond threshold and no volumes 
under threshold, or there is one volume under threshold and no volumes beyond 
threshold, 
   
   (1) one volume beyond threshold and no volumes under threshold
   ```
   Disk1, 30, 100
   Disk2, 30, 100
   Disk3, 40, 100
   
   100 / 300 = 33.3%
   Disk1: 30%
   Disk2: 30%
   Disk3: 40%
   
   Threshold: 10
   Disk utilization range (23.3, 43.3)
   Out range volume list: NULL
   
   Threshold: 5
   Disk utilization range (28.3, 38.3)
   Out range volume list: Disk3
   ```
   
   (2) one volume under threshold and no volumes beyond threshold
   ```
   Disk1, 30, 100
   Disk2, 30, 100
   Disk3, 20, 100
   
   80 / 300 = 26.7%
   Disk1: 30%
   Disk2: 30%
   Disk3: 20%
   
   Threshold: 10
   Disk utilization range (16.7, 36.7)
   Out range volume list: NULL
   
   Threshold: 5
   Disk utilization range (21.7, 31.7)
   Out range volume list: Disk3
   ```
   
   So above two cases, are typical cases which are not covered by existing 
logic. And it looks like case (2) is not covered by new logic. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to