somandal commented on code in PR #15618:
URL: https://github.com/apache/pinot/pull/15618#discussion_r2076151354


##########
pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/rebalance/TableRebalancer.java:
##########
@@ -1352,26 +1380,55 @@ static boolean isExternalViewConverged(String 
tableNameWithType,
       Map<String, Map<String, String>> externalViewSegmentStates,
       Map<String, Map<String, String>> idealStateSegmentStates, boolean 
lowDiskMode, boolean bestEfforts,
       @Nullable Set<String> segmentsToMonitor) {
-    return isExternalViewConverged(tableNameWithType, 
externalViewSegmentStates, idealStateSegmentStates, lowDiskMode,
-        bestEfforts, segmentsToMonitor, LOGGER);
+    return getNumRemainingSegmentsToProcess(tableNameWithType, 
externalViewSegmentStates, idealStateSegmentStates,
+        lowDiskMode, bestEfforts, segmentsToMonitor, LOGGER, true) == 0;
   }
 
   /**
-   * NOTE:
-   * Only check the segments in the IdealState and being monitored. Extra 
segments in ExternalView are ignored because
-   * they are not managed by the rebalancer.
-   * For each segment checked:
-   * - In regular mode, it is okay to have extra instances in ExternalView as 
long as the instance states in IdealState
-   *   are reached.
-   * - In low disk mode, instance states in ExternalView must match IdealState 
to ensure the segments are deleted from
-   *   server before moving to the next assignment.
-   * For ERROR state in ExternalView, if using best-efforts, log a warning and 
treat it as good state; if not, throw an
-   * exception to abort the rebalance because we are not able to get out of 
the ERROR state.
+   * Check if the external view has converged to the ideal state. See 
`getNumRemainingSegmentsToProcess` for details on
+   * how the convergence is determined.
    */
   private static boolean isExternalViewConverged(String tableNameWithType,
       Map<String, Map<String, String>> externalViewSegmentStates,
       Map<String, Map<String, String>> idealStateSegmentStates, boolean 
lowDiskMode, boolean bestEfforts,
       @Nullable Set<String> segmentsToMonitor, Logger tableRebalanceLogger) {
+    return getNumRemainingSegmentsToProcess(tableNameWithType, 
externalViewSegmentStates, idealStateSegmentStates,
+        lowDiskMode, bestEfforts, segmentsToMonitor, tableRebalanceLogger, 
true) == 0;
+  }
+
+  @VisibleForTesting
+  static int getNumRemainingSegmentsToProcess(String tableNameWithType,
+      Map<String, Map<String, String>> externalViewSegmentStates,
+      Map<String, Map<String, String>> idealStateSegmentStates, boolean 
lowDiskMode, boolean bestEfforts,
+      @Nullable Set<String> segmentsToMonitor) {
+    return getNumRemainingSegmentsToProcess(tableNameWithType, 
externalViewSegmentStates, idealStateSegmentStates,
+        lowDiskMode, bestEfforts, segmentsToMonitor, LOGGER, false);
+  }
+
+  /**
+   * Count the number of segments that are not in the expected state. If 
`earlyReturn=true` it returns 1 as soon as
+   * the count becomes non-zero. This is used to check whether the 
ExternalView has converged to the IdealState. The
+   * method checks the following:
+   * Only the segments in the IdealState and being monitored. Extra segments 
in ExternalView are ignored
+   * because they are not managed by the rebalancer.
+   * For each segment, go through instances in the instance map from 
IdealState and compare it with the one in
+   * ExternalView, and increment the number of remaining segments to process 
if:
+   * - The instance appears in IS instance map, but there is no instance map 
in EV, unless the IS instance state is
+   *   OFFLINE
+   * - The instance appears in IS instance map is not in the EV instance map, 
unless the IS instance state is OFFLINE
+   * - The instance has different states between IS and EV instance map, 
unless the IS instance state is OFFLINE

Review Comment:
   I think we almost never set a segment to OFFLINE state in IS, except when we 
reset the segment. If the segment is OFFLINE in IS we assume that segment is 
not needed to be online as IS is the expected state the segment should be in. 
To remove segments, we delete them or remove the server instance for that 
segment. 
   
   OFFLINE in EV but ONLINE in IS is different - indicating that the server 
might be down or the segment hasn't completed state transition change yet. does 
this help?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to