[GitHub] [hbase] infraio commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

GitBox Sat, 12 Sep 2020 13:25:10 -0700


infraio commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r486877715




##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       Why move the takeUserRegionLock to inside "try catch block"? The right 
fix is to use operation timeout......

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       I am -1 to use scanner timeout. It is too weird and confused. Operation 
timeout is not the best choice too but better. Thanks.

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare 
throw the LockTimeoutException, it will not retry. There are a check about call 
duration and callTimeout in line 156.
   
   My point here is that your 15 seconds SLA case is not right. It is still 
meet your SLA even you use the operation timeout. I didn't mean that "move 
takeUserRegionLock  to try catch block". Thanks.

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > If we wait for operation timeout period and if it can't get the lock 
after the timeout, it will not have any time remaining for next attempts. 
   
   Yes. The guarantee is that the operation will fail or success within the 
"operation timeout". No remaining time to retry and failed the operation is 
acceptable. 
   
   > are you suggesting to wait for operation timeout period while trying to 
get lock 
   Yes. Use the operation timeout period when wait for lock, instead of the 
scanner timeout now.
   
   > I think we are going back and forth on which timeout to use.
   I thought my point is clearly since we start this discussion. I suggested 
that use operation timeout instead of scanner timeout. Then you give me a 15 
seconds SLA example. Then I checked the code: use operation timeout can meet 
your SLA requirements, too. So why not use operation timeout?

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > If we wait for operation timeout period and if it can't get the lock 
after the timeout, it will not have any time remaining for next attempts. 
   
   Yes. The guarantee is that the operation will fail or success within the 
"operation timeout". No remaining time to retry and failed the operation is 
acceptable. 
   
   > are you suggesting to wait for operation timeout period while trying to 
get lock 
   
   Yes. Use the operation timeout period when wait for lock, instead of the 
scanner timeout now.
   
   > I think we are going back and forth on which timeout to use.
   
   I thought my point is clearly since we start this discussion. I suggested 
that use operation timeout instead of scanner timeout. Then you give me a 15 
seconds SLA example. Then I checked the code: use operation timeout can meet 
your SLA requirements, too. So why not use operation timeout?

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       Why move the takeUserRegionLock to inside "try catch block"? The right 
fix is to use operation timeout......

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       I am -1 to use scanner timeout. It is too weird and confused. Operation 
timeout is not the best choice too but better. Thanks.

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare 
throw the LockTimeoutException, it will not retry. There are a check about call 
duration and callTimeout in line 156.
   
   My point here is that your 15 seconds SLA case is not right. It is still 
meet your SLA even you use the operation timeout. I didn't mean that "move 
takeUserRegionLock  to try catch block". Thanks.

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > If we wait for operation timeout period and if it can't get the lock 
after the timeout, it will not have any time remaining for next attempts. 
   
   Yes. The guarantee is that the operation will fail or success within the 
"operation timeout". No remaining time to retry and failed the operation is 
acceptable. 
   
   > are you suggesting to wait for operation timeout period while trying to 
get lock 
   Yes. Use the operation timeout period when wait for lock, instead of the 
scanner timeout now.
   
   > I think we are going back and forth on which timeout to use.
   I thought my point is clearly since we start this discussion. I suggested 
that use operation timeout instead of scanner timeout. Then you give me a 15 
seconds SLA example. Then I checked the code: use operation timeout can meet 
your SLA requirements, too. So why not use operation timeout?

##########
File path: 
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName 
tableName, byte[] row, bool
     }
   }
 
+  void takeUserRegionLock() throws IOException {
+    try {
+      long waitTime = connectionConfig.getScannerTimeoutPeriod();

Review comment:
       > If we wait for operation timeout period and if it can't get the lock 
after the timeout, it will not have any time remaining for next attempts. 
   
   Yes. The guarantee is that the operation will fail or success within the 
"operation timeout". No remaining time to retry and failed the operation is 
acceptable. 
   
   > are you suggesting to wait for operation timeout period while trying to 
get lock 
   
   Yes. Use the operation timeout period when wait for lock, instead of the 
scanner timeout now.
   
   > I think we are going back and forth on which timeout to use.
   
   I thought my point is clearly since we start this discussion. I suggested 
that use operation timeout instead of scanner timeout. Then you give me a 15 
seconds SLA example. Then I checked the code: use operation timeout can meet 
your SLA requirements, too. So why not use operation timeout?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hbase] infraio commented on a change in pull request #2322: [HBASE-24956] ConnectionManager#locateRegionInMeta waits for user region lock indefinitely.

Reply via email to