infraio commented on a change in pull request #2322:
URL: https://github.com/apache/hbase/pull/2322#discussion_r486877715
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
Why move the takeUserRegionLock to inside "try catch block"? The right
fix is to use operation timeout......
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
I am -1 to use scanner timeout. It is too weird and confused. Operation
timeout is not the best choice too but better. Thanks.
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare
throw the LockTimeoutException, it will not retry. There are a check about call
duration and callTimeout in line 156.
My point here is that your 15 seconds SLA case is not right. It is still
meet your SLA even you use the operation timeout. I didn't mean that "move
takeUserRegionLock to try catch block". Thanks.
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> If we wait for operation timeout period and if it can't get the lock
after the timeout, it will not have any time remaining for next attempts.
Yes. The guarantee is that the operation will fail or success within the
"operation timeout". No remaining time to retry and failed the operation is
acceptable.
> are you suggesting to wait for operation timeout period while trying to
get lock
Yes. Use the operation timeout period when wait for lock, instead of the
scanner timeout now.
> I think we are going back and forth on which timeout to use.
I thought my point is clearly since we start this discussion. I suggested
that use operation timeout instead of scanner timeout. Then you give me a 15
seconds SLA example. Then I checked the code: use operation timeout can meet
your SLA requirements, too. So why not use operation timeout?
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> If we wait for operation timeout period and if it can't get the lock
after the timeout, it will not have any time remaining for next attempts.
Yes. The guarantee is that the operation will fail or success within the
"operation timeout". No remaining time to retry and failed the operation is
acceptable.
> are you suggesting to wait for operation timeout period while trying to
get lock
Yes. Use the operation timeout period when wait for lock, instead of the
scanner timeout now.
> I think we are going back and forth on which timeout to use.
I thought my point is clearly since we start this discussion. I suggested
that use operation timeout instead of scanner timeout. Then you give me a 15
seconds SLA example. Then I checked the code: use operation timeout can meet
your SLA requirements, too. So why not use operation timeout?
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
Why move the takeUserRegionLock to inside "try catch block"? The right
fix is to use operation timeout......
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
I am -1 to use scanner timeout. It is too weird and confused. Operation
timeout is not the best choice too but better. Thanks.
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> Reread RpcRetryingCallerImpl#callWithRetries code. If callable.prepare
throw the LockTimeoutException, it will not retry. There are a check about call
duration and callTimeout in line 156.
My point here is that your 15 seconds SLA case is not right. It is still
meet your SLA even you use the operation timeout. I didn't mean that "move
takeUserRegionLock to try catch block". Thanks.
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> If we wait for operation timeout period and if it can't get the lock
after the timeout, it will not have any time remaining for next attempts.
Yes. The guarantee is that the operation will fail or success within the
"operation timeout". No remaining time to retry and failed the operation is
acceptable.
> are you suggesting to wait for operation timeout period while trying to
get lock
Yes. Use the operation timeout period when wait for lock, instead of the
scanner timeout now.
> I think we are going back and forth on which timeout to use.
I thought my point is clearly since we start this discussion. I suggested
that use operation timeout instead of scanner timeout. Then you give me a 15
seconds SLA example. Then I checked the code: use operation timeout can meet
your SLA requirements, too. So why not use operation timeout?
##########
File path:
hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java
##########
@@ -968,6 +974,19 @@ private RegionLocations locateRegionInMeta(TableName
tableName, byte[] row, bool
}
}
+ void takeUserRegionLock() throws IOException {
+ try {
+ long waitTime = connectionConfig.getScannerTimeoutPeriod();
Review comment:
> If we wait for operation timeout period and if it can't get the lock
after the timeout, it will not have any time remaining for next attempts.
Yes. The guarantee is that the operation will fail or success within the
"operation timeout". No remaining time to retry and failed the operation is
acceptable.
> are you suggesting to wait for operation timeout period while trying to
get lock
Yes. Use the operation timeout period when wait for lock, instead of the
scanner timeout now.
> I think we are going back and forth on which timeout to use.
I thought my point is clearly since we start this discussion. I suggested
that use operation timeout instead of scanner timeout. Then you give me a 15
seconds SLA example. Then I checked the code: use operation timeout can meet
your SLA requirements, too. So why not use operation timeout?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]