[ 
https://issues.apache.org/jira/browse/HBASE-28663?focusedWorklogId=923552&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-923552
 ]

David Manning logged work on HBASE-28663:
-----------------------------------------

                Author: David Manning
            Created on: 14/Jun/24 17:05
            Start Date: 14/Jun/24 17:05
    Worklog Time Spent: 24h 

Issue Time Tracking
-------------------

            Worklog Id:     (was: 923552)
    Remaining Estimate: 0h  (was: 24h)
            Time Spent: 24h

> CanaryTool continues executing and scanning after timeout
> ---------------------------------------------------------
>
>                 Key: HBASE-28663
>                 URL: https://issues.apache.org/jira/browse/HBASE-28663
>             Project: HBase
>          Issue Type: Bug
>          Components: canary
>    Affects Versions: 2.0.0, 3.0.0
>            Reporter: David Manning
>            Assignee: David Manning
>            Priority: Minor
>              Labels: pull-request-available
>   Original Estimate: 24h
>          Time Spent: 24h
>  Remaining Estimate: 0h
>
> If you run the {{CanaryTool}} in region mode until it reaches the configured 
> timeout, the logs and sink results will show that it can continue executing 
> and scanning for 10 seconds.
> This is because the RegionTasks have already been submitted to an 
> ExecutorService which continues execution after timeout, and the Monitor 
> continues execution on a separate thread.
> The 10 second delay in shutdown is seen, in hbase 2.x at least, because 
> {{runMonitor}} will close the {{Connection}} and that process 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
>  will lead to {{ConnectionImplementation#close}} 
> ([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
>  and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
> {{awaitTermination}} if client operations are in progress.
> The scenario can be improved by simply interrupting the monitor thread, as we 
> will often be in an {{invokeAll}} call in a {{sniff}} method. The 
> {{invokeAll}} method is blocking, and interrupting the monitor in this call 
> will interrupt the client threads and generally shutdown properly and timely. 
> However, we can be more robust by also watching for a shutdown signal in the 
> various tasks such as {{RegionTask}} so any remaining tasks will drain 
> quickly and without errors.
>  
> {code:java}
> 2024-06-12 02:57:14 [Time-limited test] ERROR tool.Canary(1076): The monitor 
> is running too long (1140098) after timeout limit:1140000 will be killed 
> itself !!
> 2024-06-12 02:57:14 [Time-limited test] INFO 
> client.ConnectionImplementation(2039): Closing master protocol: MasterService
> 2024-06-12 02:57:14 [pool-3-thread-4] ERROR tool.Canary(353): Read from 
> REGION1. on serverName=REGIONSERVER-1, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@54f2a9a4
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 7, active threads = 7, queued tasks = 0, completed tasks = 
> 180094]
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:199)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
> [... repeats for 10 seconds and tens of thousands of regions ... ]
> 2024-06-12 02:57:16 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
> REGION10000. on serverName=REGIONSERVER-2, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@d08d21f
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Shutting 
> down, pool size = 3, active threads = 3, queued tasks = 0, completed tasks = 
> 180098]
> [...]
> 2024-06-12 02:57:24 [pool-3-thread-11] ERROR tool.Canary(353): Read from 
> REGION42000. on serverName=REGIONSERVER-3, columnFamily=0 failed
> java.lang.RuntimeException: java.util.concurrent.RejectedExecutionException: 
> Task 
> org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture@38e7a5a1
>  rejected from java.util.concurrent.ThreadPoolExecutor@2d3d204d[Terminated, 
> pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 180101]
> 2024-06-12T02:57:24.202Z, java.io.InterruptedIOException
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.throwEnrichedException(RpcRetryingCallerWithReadReplicas.java:294)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:255)
>       at 
> org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:53)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithoutRetries(RpcRetryingCallerImpl.java:191)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:271)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.loadCache(ClientScanner.java:440)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.nextWithSyncCache(ClientScanner.java:314)
>       at 
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:612)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.readColumnFamily(CanaryTool.java:565)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:609)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471) 
> 2024-06-12 02:57:24 [pool-3-thread-8] WARN 
> client.ConnectionImplementation(428): Checking master connection 
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException: Call to 
> address=HMASTER-1 failed on local exception: 
> org.apache.hadoop.hbase.ipc.StoppedRpcClientException at 
> sun.reflect.GeneratedConstructorAccessor27.newInstance(Unknown Source) at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at 
> org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:206) at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:391)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:92)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:425)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:420)
>  at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:114) at 
> org.apache.hadoop.hbase.ipc.Call.setException(Call.java:129) at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callMethod(AbstractRpcClient.java:446)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:332)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$200(AbstractRpcClient.java:92)
>  at 
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:595)
>  at 
> org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceState.isMasterRunning(ConnectionImplementation.java:1235)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.lambda$masterConnectionStateSupplier$1(ConnectionImplementation.java:421)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.isKeepAliveMasterConnectedAndRunning(ConnectionImplementation.java:2025)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1441)
>  at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1433)
>  at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57) 
> at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:101)
>  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3020)
>  at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:579)
>  at org.apache.hadoop.hbase.client.HTable.getDescriptor(HTable.java:242) at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:593) 
> at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503) 
> at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471)
> 2024-06-12 02:57:24 [pool-3-thread-7] ERROR tool.Canary(345): Read from 
> REGION43000. on serverName=REGIONSERVER-4 failed
> org.apache.hadoop.hbase.DoNotRetryIOException: Connection was closed while 
> trying to get master
>       at 
> org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1381)
>       at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1445)
>       at 
> org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1433)
>       at 
> org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
>       at 
> org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:101)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3020)
>       at 
> org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:579)
>       at org.apache.hadoop.hbase.client.HTable.getDescriptor(HTable.java:242)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.read(CanaryTool.java:593)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:503)
>       at 
> org.apache.hadoop.hbase.tool.CanaryTool$RegionTask.call(CanaryTool.java:471){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to