Re: [PR] HBASE-28215: region reopen procedure batching/throttling [hbase]

via GitHub Wed, 22 Nov 2023 19:55:29 -0800


Apache9 commented on code in PR #5534:
URL: https://github.com/apache/hbase/pull/5534#discussion_r1402891739



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ReopenTableRegionsProcedure.java:
##########
@@ -61,20 +69,36 @@ public class ReopenTableRegionsProcedure
 
   private List<HRegionLocation> regions = Collections.emptyList();
 
+  private List<HRegionLocation> currentRegionBatch = Collections.emptyList();
+
   private RetryCounter retryCounter;
 
+  private final long reopenBatchBackoffMillis;

Review Comment:
   Procedure should not have final fields which have to be initialized by 
input. As when loading the procedure from procedure store, we can only 
initialize them when calling deserialize method...



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/ReopenTableRegionsProcedure.java:
##########
@@ -139,33 +170,57 @@ protected Flow executeFromState(MasterProcedureEnv env, 
ReopenTableRegionsState
       case REOPEN_TABLE_REGIONS_CONFIRM_REOPENED:
         regions = 
regions.stream().map(env.getAssignmentManager().getRegionStates()::checkReopened)
           .filter(l -> l != null).collect(Collectors.toList());
-        if (regions.isEmpty()) {
-          return Flow.NO_MORE_STATE;
+        // we need to create a set of region names because the HRegionLocation 
hashcode is only
+        // based
+        // on the server name
+        Set<byte[]> currentRegionBatchNames = currentRegionBatch.stream()
+          .map(r -> r.getRegion().getRegionName()).collect(Collectors.toSet());
+        currentRegionBatch = regions.stream()
+          .filter(r -> 
currentRegionBatchNames.contains(r.getRegion().getRegionName()))
+          .collect(Collectors.toList());
+        if (currentRegionBatch.isEmpty()) {
+          if (regions.isEmpty()) {
+            return Flow.NO_MORE_STATE;
+          } else {
+            
setNextState(ReopenTableRegionsState.REOPEN_TABLE_REGIONS_REOPEN_REGIONS);
+            if (reopenBatchBackoffMillis > 0) {
+              backoff(reopenBatchBackoffMillis);
+            }
+            return Flow.HAS_MORE_STATE;
+          }
         }
-        if (regions.stream().anyMatch(loc -> canSchedule(env, loc))) {
+        if (currentRegionBatch.stream().anyMatch(loc -> canSchedule(env, 
loc))) {
           retryCounter = null;
           
setNextState(ReopenTableRegionsState.REOPEN_TABLE_REGIONS_REOPEN_REGIONS);
+          if (reopenBatchBackoffMillis > 0) {
+            backoff(reopenBatchBackoffMillis);
+          }
           return Flow.HAS_MORE_STATE;
         }
         // We can not schedule TRSP for all the regions need to reopen, wait 
for a while and retry
         // again.
         if (retryCounter == null) {
           retryCounter = 
ProcedureUtil.createRetryCounter(env.getMasterConfiguration());
         }
-        long backoff = retryCounter.getBackoffTimeAndIncrementAttempts();
+        long backoffMillis = retryCounter.getBackoffTimeAndIncrementAttempts();
         LOG.info(
-          "There are still {} region(s) which need to be reopened for table {} 
are in "
+          "There are still {} region(s) which need to be reopened for table 
{}. {} are in "
             + "OPENING state, suspend {}secs and try again later",
-          regions.size(), tableName, backoff / 1000);
-        setTimeout(Math.toIntExact(backoff));
-        setState(ProcedureProtos.ProcedureState.WAITING_TIMEOUT);
-        skipPersistence();
+          regions.size(), tableName, currentRegionBatch.size(), backoffMillis 
/ 1000);
+        backoff(backoffMillis);
         throw new ProcedureSuspendedException();
       default:
         throw new UnsupportedOperationException("unhandled state=" + state);
     }
   }
 
+  private void backoff(long millis) throws ProcedureSuspendedException {
+    setTimeout(Math.toIntExact(millis));
+    setState(ProcedureProtos.ProcedureState.WAITING_TIMEOUT);
+    skipPersistence();

Review Comment:
   I think it is OK to skip persistence here as we do not persist the reopen of 
the region in procedure state, IIRC we will use the openSeqNum to determine 
whether the region has already been reopened.
   But the problem here is that, the old logic is for error retrying, where we 
can not schedule TRSP for some regions, but here, we are doing throttling, so I 
do not think we should keep increase the retry number and increase the retry 
interval while scheduling TRSPs...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-28215: region reopen procedure batching/throttling [hbase]

Reply via email to