Re: [PR] HBASE-28199 Phase I: Suspend TRSP and SCP when updating meta [hbase]

via GitHub Sun, 03 Dec 2023 06:48:51 -0800


carp84 commented on code in PR #5520:
URL: https://github.com/apache/hbase/pull/5520#discussion_r1413117349



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateNodeLock.java:
##########
@@ -0,0 +1,164 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hbase.master.assignment;
+
+import java.util.concurrent.locks.Condition;
+import java.util.concurrent.locks.Lock;
+import java.util.concurrent.locks.ReentrantLock;
+import org.apache.hadoop.hbase.client.RegionInfo;
+import org.apache.hadoop.hbase.procedure2.Procedure;
+import org.apache.yetus.audience.InterfaceAudience;
+
+/**
+ * A lock implementation which supports unlock by another thread.
+ * <p>
+ * This is because we need to hold region state node lock while updating 
region state to meta(for
+ * keeping consistency), so it is better to yield the procedure to release the 
procedure worker. But
+ * after waking up the procedure, we may use another procedure worker to 
execute the procedure,
+ * which means we need to unlock by another thread. See HBASE-28196 for more 
details.
+ */
+@InterfaceAudience.Private
+class RegionStateNodeLock {
+
+  // for better logging message
+  private final RegionInfo regionInfo;
+
+  private final Lock lock = new ReentrantLock();
+
+  private final Condition cond = lock.newCondition();
+
+  private Object owner;
+
+  private int count;
+
+  RegionStateNodeLock(RegionInfo regionInfo) {
+    this.regionInfo = regionInfo;
+  }
+
+  private void lock0(Object lockBy) {
+    lock.lock();
+    try {
+      for (;;) {
+        if (owner == null) {
+          owner = lockBy;
+          count = 1;
+          return;
+        }
+        if (owner == lockBy) {
+          count++;
+          return;
+        }
+        cond.awaitUninterruptibly();
+      }
+    } finally {
+      lock.unlock();
+    }
+  }
+
+  private boolean tryLock0(Object lockBy) {
+    lock.lock();

Review Comment:
   Maybe we could do a fast fail here if the current thread failed to acquire 
the lock, in addition to the succeeding fast fail when failed to assign the 
`owner` to the given object?
   ```suggestion
       if (!lock.tryLock()) {
         return false;
       }
   ```



##########
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java:
##########
@@ -249,25 +272,31 @@ private void removeMirrorMetaLocation(int 
oldReplicaCount, int newReplicaCount)
     }
   }
 
-  private void updateRegionLocation(RegionInfo regionInfo, State state, Put 
put)
-    throws IOException {
-    try {
-      if (regionInfo.isMetaRegion()) {
+  private CompletableFuture<Void> updateRegionLocation(RegionInfo regionInfo, 
State state,
+    Put put) {
+    CompletableFuture<Void> future;
+    if (regionInfo.isMetaRegion()) {
+      try {
         masterRegion.update(r -> r.put(put));
-      } else {
-        try (Table table = 
master.getConnection().getTable(TableName.META_TABLE_NAME)) {
-          table.put(put);
-        }
+        future = CompletableFuture.completedFuture(null);
+      } catch (Exception e) {
+        future = FutureUtils.failedFuture(e);
       }
-    } catch (IOException e) {
-      // TODO: Revist!!!! Means that if a server is loaded, then we will abort 
our host!
-      // In tests we abort the Master!
-      String msg = String.format("FAILED persisting region=%s state=%s",
-        regionInfo.getShortNameToLog(), state);
-      LOG.error(msg, e);
-      master.abort(msg, e);
-      throw e;
+    } else {
+      AsyncTable<?> table = 
master.getAsyncConnection().getTable(TableName.META_TABLE_NAME);
+      future = table.put(put);
     }
+    FutureUtils.addListener(future, (r, e) -> {
+      if (e != null) {
+        // TODO: Revist!!!! Means that if a server is loaded, then we will 
abort our host!
+        // In tests we abort the Master!

Review Comment:
   It seems this TODO has been existing for some time, shall we make the 
"Revisit" this time, or we could safely remove this TODO?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@hbase.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] HBASE-28199 Phase I: Suspend TRSP and SCP when updating meta [hbase]

Reply via email to