David Manning created HBASE-28663:
-------------------------------------

             Summary: CanaryTool continues executing and scanning after timeout
                 Key: HBASE-28663
                 URL: https://issues.apache.org/jira/browse/HBASE-28663
             Project: HBase
          Issue Type: Bug
          Components: canary
    Affects Versions: 2.0.0, 3.0.0
            Reporter: David Manning
            Assignee: David Manning


If you run theĀ {{CanaryTool}} in region mode until it reaches the configured 
timeout, the logs and sink results will show that it can continue executing and 
scanning for 10 seconds.

This is because the RegionTasks have already been submitted to an 
ExecutorService which continues execution after timeout, and the Monitor 
continues execution on a separate thread.

The 10 seconds is seen in hbase 2.x, at least, because {{runMonitor}} will 
close the {{Connection}} and that process 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-server/src/main/java/org/apache/hadoop/hbase/tool/CanaryTool.java#L1054-L1094])
 will lead to {{ConnectionImplementation#close}} 
([code|https://github.com/apache/hbase/blob/e865c852c0e9a1e9b55b9d1512d379072d3e7a7b/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L2272-L2300])
 and inside {{shutdownPools}} we will potentially wait the full 10 seconds of 
{{awaitTermination}} if client operations are in progress.

The scenario can be improved by simply interrupting the monitor thread, as we 
will often be in an {{invokeAll}} call in a {{sniff}} method, which will 
interrupt the client threads and generally shutdown properly and timely. 
However, we could be more robust by also watching for a shutdown signal in the 
various tasks such as {{{}RegionTask{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to