[ 
https://issues.apache.org/jira/browse/HBASE-27961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nihal Jain updated HBASE-27961:
-------------------------------
    Description: 
While trying to run assigns command with a huge list of region, it fails with 
CTE. Even on trying to run it by breaking input into multiple files, it still 
fails and have to blindly submit same command again and again until no error.

Exception seen as described above is as follows: 
{code:java}
Exception in thread "main" java.io.IOException: 
org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:146)
        at org.apache.hbase.HBCK2.assigns(HBCK2.java:454)
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1070)
        at org.apache.hbase.HBCK2.run(HBCK2.java:1028)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hbase.HBCK2.main(HBCK2.java:1367)
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:340)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:594)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
        at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:141)
        ... 6 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:222)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:424)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
        at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:134)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:135)
        ... 6 more

{code}

The same issue should be valid for most of the command like unassigns, 
reportMissingRegionsInMeta etc.

*Proposed fixed*
 * This can be fixed by introducing batching if a list of regions is passed via 
commandline.
 * Also in case files are specified via -i arg, we could treat each individual 
file as a batch.


  was:
While trying to run assigns command with a huge list of region, it fails with 
CTE. Even on trying to run it by breaking input into multiple files, it still 
fails and have to blindly submit same command again and again until no error.

Exception seen as described above is as follows: 
{code:java}
Exception in thread "main" java.io.IOException: 
org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:146)
        at org.apache.hbase.HBCK2.assigns(HBCK2.java:454)
        at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1070)
        at org.apache.hbase.HBCK2.run(HBCK2.java:1028)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at org.apache.hbase.HBCK2.main(HBCK2.java:1367)
Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:340)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:594)
        at 
org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
        at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:141)
        ... 6 more
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:222)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:424)
        at 
org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
        at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:107)
        at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:134)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
        at 
org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
        at 
org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
        at 
org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:135)
        ... 6 more

{code}
Proposed fixed:
 * This can be fixed by introducing batching if a list of regions is passed via 
commandline.
 * Also in case files are specified via -i arg, we could treat each individual 
file as a batch.


> [HBCK2] Running assigns/unassigns command with large number of files/regions 
> throws CallTimeoutException
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-27961
>                 URL: https://issues.apache.org/jira/browse/HBASE-27961
>             Project: HBase
>          Issue Type: Bug
>          Components: hbck2
>            Reporter: Nihal Jain
>            Assignee: Nihal Jain
>            Priority: Major
>
> While trying to run assigns command with a huge list of region, it fails with 
> CTE. Even on trying to run it by breaking input into multiple files, it still 
> fails and have to blindly submit same command again and again until no error.
> Exception seen as described above is as follows: 
> {code:java}
> Exception in thread "main" java.io.IOException: 
> org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
>       at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:146)
>       at org.apache.hbase.HBCK2.assigns(HBCK2.java:454)
>       at org.apache.hbase.HBCK2.doCommandLine(HBCK2.java:1070)
>       at org.apache.hbase.HBCK2.run(HBCK2.java:1028)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at org.apache.hbase.HBCK2.main(HBCK2.java:1367)
> Caused by: org.apache.hbase.thirdparty.com.google.protobuf.ServiceException: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:340)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:594)
>       at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$HbckService$BlockingStub.assigns(MasterProtos.java)
>       at org.apache.hadoop.hbase.client.HBaseHbck.assigns(HBaseHbck.java:141)
>       ... 6 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=SOME_HOST_NAME:SOME_PORT_NUMBER failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
>       at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:222)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:424)
>       at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:419)
>       at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:107)
>       at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:134)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.run(HashedWheelTimer.java:715)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.concurrent.ImmediateExecutor.execute(ImmediateExecutor.java:34)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:703)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:790)
>       at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:503)
>       at java.lang.Thread.run(Thread.java:750)
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=0,methodName=Assigns], waitTime=90142ms, rpcTimeout=90000ms
>       at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:135)
>       ... 6 more
> {code}
> The same issue should be valid for most of the command like unassigns, 
> reportMissingRegionsInMeta etc.
> *Proposed fixed*
>  * This can be fixed by introducing batching if a list of regions is passed 
> via commandline.
>  * Also in case files are specified via -i arg, we could treat each 
> individual file as a batch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to