[jira] [Comment Edited] (HBASE-25735) Add target Region to connection exceptions

Andrew Kyle Purtell (Jira) Thu, 08 Apr 2021 12:15:29 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-25735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317434#comment-17317434
 ]


Andrew Kyle Purtell edited comment on HBASE-25735 at 4/8/21, 7:14 PM:
----------------------------------------------------------------------

[~stack] Would you be up for an addendum that puts back these methods to 
RpcControllerFactory for compat? I think we want to pursue making 
RpcControllerFactory and HBaseRpcController both LP(CONFIG,PHOENIX) as it 
should have been 4 years ago. 

{code}
public HBaseRpcController newController(final CellScanner cellScanner);
{code}

and

{code}
public HBaseRpcController newController(final List<CellScannable> 
cellIterables);
{code}

Elsewhere the logging improvements already do a null check for regioninfo. 

Phoenix's MetadataRpcController sets RPC priority for its SYSTEM tables. Their 
IndexRpcController also sets priority higher for index updates so index updates 
are processed in a separate queue from base table updates. I don't see how to 
do that elsewhere without making a new plug point for priority calculation. 
That is not inherently better than supporting the extension of these 
interfaces. 


was (Author: apurtell):
[~stack] Would you be up for an addendum that puts back these methods for 
compat? I think we want to pursue making RpcControllerFactory and 
HBaseRpcController both LP(CONFIG,PHOENIX) as it should have been 4 years ago. 

{code}
public HBaseRpcController newController(final CellScanner cellScanner);
{code}

and

{code}
public HBaseRpcController newController(final List<CellScannable> 
cellIterables);
{code}

Elsewhere the logging improvements already do a null check for regioninfo. 

Phoenix's MetadataRpcController sets RPC priority for its SYSTEM tables. Their 
IndexRpcController also sets priority higher for index updates so index updates 
are processed in a separate queue from base table updates. I don't see how to 
do that elsewhere without making a new plug point for priority calculation. 
That is not inherently better than supporting the extension of these 
interfaces. 

> Add target Region to connection exceptions
> ------------------------------------------
>
>                 Key: HBASE-25735
>                 URL: https://issues.apache.org/jira/browse/HBASE-25735
>             Project: HBase
>          Issue Type: Bug
>          Components: rpc
>            Reporter: Michael Stack
>            Assignee: Michael Stack
>            Priority: Major
>             Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> We spent a bit of time making it so exceptions included the remote host name. 
> Looks like we can add the target Region name too with a bit of manipulation; 
> will help figuring hot-spotting or problem Region on serverside.  For 
> example, here is what I was seeing recently on client-side when a RS was was 
> timing out requests:
> {code}
> 2021-04-06T02:18:23.533Z, RpcRetryingCaller{globalStartTime=1617675482894, 
> pause=100, maxAttempts=4}, org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call to ps0989.example.org/1.1.1.1:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
>         at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:145)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:383)
>         at org.apache.hadoop.hbase.client.HTable.get(HTable.java:357)
> ...
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> ps0989.bot.parsec.apple.com/17.58.114.206:16020 failed on local exception: 
> org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
>         at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:209)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:378)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:89)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:409)
>         at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:405)
>         at org.apache.hadoop.hbase.ipc.Call.setTimeout(Call.java:110)
>         at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:136)
>         at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
>         at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
>         at 
> org.apache.hbase.thirdparty.io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
>         ... 1 more
> Caused by: org.apache.hadoop.hbase.ipc.CallTimeoutException: 
> Call[id=88369369,methodName=Get], waitTime=5006, rpcTimeout=5000
>         at 
> org.apache.hadoop.hbase.ipc.RpcConnection$1.run(RpcConnection.java:137)
>         ... 4 more
> {code}
> I wanted the region it was hitting. I wanted to know if it was a server 
> problem or a Region issue. If clients only having issue w/ one Region, then I 
> could focus on it.
> After the PR the exception (from another context) looks like this:
> {code}
> org.apache.hadoop.hbase.ipc.CallTimeoutException: Call to 
> address=127.0.0.1:12345, regionInfo=hbase:meta,,1.1588230740 failed on local 
> exception: org.apache.hadoop.hbase.ipc.CallTimeoutException: error
> ....
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (HBASE-25735) Add target Region to connection exceptions

Reply via email to