infraio commented on a change in pull request #2322: URL: https://github.com/apache/hbase/pull/2322#discussion_r486877715
########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: Why move the takeUserRegionLock to inside "try catch block"? The right fix is to use operation timeout...... ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: I am -1 to use scanner timeout. It is too weird and confused. Operation timeout is not the best choice too but better. Thanks. ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare throw the LockTimeoutException, it will not retry. There are a check about call duration and callTimeout in line 156. My point here is that your 15 seconds SLA case is not right. It is still meet your SLA even you use the operation timeout. I didn't mean that "move takeUserRegionLock to try catch block". Thanks. ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > If we wait for operation timeout period and if it can't get the lock after the timeout, it will not have any time remaining for next attempts. Yes. The guarantee is that the operation will fail or success within the "operation timeout". No remaining time to retry and failed the operation is acceptable. > are you suggesting to wait for operation timeout period while trying to get lock Yes. Use the operation timeout period when wait for lock, instead of the scanner timeout now. > I think we are going back and forth on which timeout to use. I thought my point is clearly since we start this discussion. I suggested that use operation timeout instead of scanner timeout. Then you give me a 15 seconds SLA example. Then I checked the code: use operation timeout can meet your SLA requirements, too. So why not use operation timeout? ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > If we wait for operation timeout period and if it can't get the lock after the timeout, it will not have any time remaining for next attempts. Yes. The guarantee is that the operation will fail or success within the "operation timeout". No remaining time to retry and failed the operation is acceptable. > are you suggesting to wait for operation timeout period while trying to get lock Yes. Use the operation timeout period when wait for lock, instead of the scanner timeout now. > I think we are going back and forth on which timeout to use. I thought my point is clearly since we start this discussion. I suggested that use operation timeout instead of scanner timeout. Then you give me a 15 seconds SLA example. Then I checked the code: use operation timeout can meet your SLA requirements, too. So why not use operation timeout? ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: Why move the takeUserRegionLock to inside "try catch block"? The right fix is to use operation timeout...... ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: I am -1 to use scanner timeout. It is too weird and confused. Operation timeout is not the best choice too but better. Thanks. ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare throw the LockTimeoutException, it will not retry. There are a check about call duration and callTimeout in line 156. My point here is that your 15 seconds SLA case is not right. It is still meet your SLA even you use the operation timeout. I didn't mean that "move takeUserRegionLock to try catch block". Thanks. ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > If we wait for operation timeout period and if it can't get the lock after the timeout, it will not have any time remaining for next attempts. Yes. The guarantee is that the operation will fail or success within the "operation timeout". No remaining time to retry and failed the operation is acceptable. > are you suggesting to wait for operation timeout period while trying to get lock Yes. Use the operation timeout period when wait for lock, instead of the scanner timeout now. > I think we are going back and forth on which timeout to use. I thought my point is clearly since we start this discussion. I suggested that use operation timeout instead of scanner timeout. Then you give me a 15 seconds SLA example. Then I checked the code: use operation timeout can meet your SLA requirements, too. So why not use operation timeout? ########## File path: hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java ########## @@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName tableName, byte[] row, bool } } + void takeUserRegionLock() throws IOException { + try { + long waitTime = connectionConfig.getScannerTimeoutPeriod(); Review comment: > If we wait for operation timeout period and if it can't get the lock after the timeout, it will not have any time remaining for next attempts. Yes. The guarantee is that the operation will fail or success within the "operation timeout". No remaining time to retry and failed the operation is acceptable. > are you suggesting to wait for operation timeout period while trying to get lock Yes. Use the operation timeout period when wait for lock, instead of the scanner timeout now. > I think we are going back and forth on which timeout to use. I thought my point is clearly since we start this discussion. I suggested that use operation timeout instead of scanner timeout. Then you give me a 15 seconds SLA example. Then I checked the code: use operation timeout can meet your SLA requirements, too. So why not use operation timeout? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org