[
https://issues.apache.org/jira/browse/HBASE-26812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17507628#comment-17507628
]
chenglei edited comment on HBASE-26812 at 3/16/22, 2:00 PM:
------------------------------------------------------------
[~zhangduo], the scenario is in {{RSRpcServices.scan}} or {{RSRpcServices.get}}
for serving remote rpc call, we may directly invoke {{RSRpcServices.scan}} or
{{RSRpcServices.get}} on the same RegionServer through
{{ShortCircuitingClusterConnection}} in region CPs such as
{{RegionObserver.postScannerOpen}} to scan other rows, so the {{RegionScanner}}
created for the directly {{RSRpcServices.scan}} or {{RSRpcServices.get}} could
not be closed until the outer rpc call completes because there is an outer
{{RpcContext}}, and even worse , the {{ServerCall.rpcCallback}} may be override
which would cause serious problem.
A simple fix I could think is for
{{ShortCircuitingClusterConnection.getClient}}, if return
{{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper
class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and
{{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}}
method call.
was (Author: comnetwork):
[~zhangduo], the scenario is in {{RSRpcServices.scan}} or {{RSRpcServices.get}}
for serving remote rpc call, we may directly invoke {{RSRpcServices.scan}} or
{{RSRpcServices.get}} on the same RegionServer through
{{ShortCircuitingClusterConnection}} in region CPs such as
{{RegionObserver.postScannerOpen}} to scan other rows, so the {{RegionScanner}}
created for the directly {{RSRpcServices.scan}} or {{RSRpcServices.get}} could
not be closed until the outer rpc call completes because there is a outer
{{RpcContext}}, and even worse , the {{ServerCall.rpcCallback}} may be override
which would cause serious problem.
A simple fix I could think is for
{{ShortCircuitingClusterConnection.getClient}}, if return
{{ShortCircuitingClusterConnection.localHostClient}},we could add a wrapper
class to wrap it , which using {{RpcUtil.setRpcContext(null)}} and
{{RpcUtil.setRpcContext(oldRpcCall)}} to surround the {{scan}} and {{get}}
method call.
> ShortCircuitingClusterConnection fails to close RegionScanners when making
> short-circuited calls
> ------------------------------------------------------------------------------------------------
>
> Key: HBASE-26812
> URL: https://issues.apache.org/jira/browse/HBASE-26812
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.4.9
> Reporter: Lars Hofhansl
> Priority: Critical
>
> Just ran into this on the Phoenix side.
> We retrieve a Connection via
> {{{}RegionCoprocessorEnvironment.createConnection... getTable(...){}}}. And
> then call get on that table. The Get's key happens to be local. Now each call
> to table.get() leaves an open StoreScanner around forever. (verified with a
> memory profiler).
> There references are held via
> RegionScannerImpl.storeHeap.scannersForDelayedClose. Eventially the
> RegionServer goes into a GC of death and can only ended with kill -9.
> The reason appears to be that in this case there is no currentCall context.
> Some time in 2.x the Rpc handler/call was made responsible for closing open
> region scanners, but we forgot to handle {{ShortCircuitingClusterConnection}}
> It's not immediately clear how to fix this. But it does make
> ShortCircuitingClusterConnection useless and dangerous. If you use it, you
> *will* create a giant memory leak.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)