Apache9 commented on PR #4762:
URL: https://github.com/apache/hbase/pull/4762#issuecomment-1243943159

   OK, we have this check in code
   
   ```
       if (!this.isSourceActive()) {
         setSourceStartupStatus(false);
         if (Thread.currentThread().isInterrupted()) {
           // If source is not running and thread is interrupted this means 
someone has tried to
           // remove this peer.
           return;
         }
   
         retryStartup.set(!this.abortOnError);
         throw new IllegalStateException("Source should be active.");
       }
   ```
   
   ```
   2022-09-12T23:00:16,183 WARN  
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,1-zhangduo-virtualbox,34201,1662994782531]
 replication.HBaseReplicationEndpoint(206): 1 Failed to create connection for 
peer cluster
   java.io.InterruptedIOException: null
           at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:184) 
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
 ~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:555)
 ~[classes/:?]
           at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
   Caused by: java.lang.InterruptedException
           at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) 
~[?:1.8.0_292]
           at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
~[?:1.8.0_292]
           at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182) 
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
           ... 10 more
   2022-09-12T23:00:16,184 WARN  
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,1-zhangduo-virtualbox,34201,1662994782531]
 regionserver.ReplicationSource(559): peerId=1, Error starting 
ReplicationEndpoint, retry
   java.lang.IllegalStateException: Expected the service 
HBaseInterClusterReplicationEndpoint [FAILED] to be RUNNING, but the service 
has FAILED
           at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:381)
 ~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
           at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:321)
 ~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:328)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:555)
 ~[classes/:?]
           at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
   Caused by: java.io.InterruptedIOException
           at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:184) 
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
 ~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
 ~[classes/:?]
           ... 2 more
   Caused by: java.lang.InterruptedException
           at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347) 
~[?:1.8.0_292]
           at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) 
~[?:1.8.0_292]
           at 
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182) 
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
 ~[classes/:?]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
 ~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
           at 
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
 ~[classes/:3.0.0-alpha-4-SNAPSHOT]
           at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
 ~[classes/:?]
           ... 2 more
   ```
   
   But seems it is not enough, as we will wrap the InterruptedException with 
InterruptedIOException in FutureUtils.get, but the upper layer just warn it, 
without restoring the interrupted state...
   
   For me, I think we should avoid using Thread.isInterrupted to test whether 
we should abort a region server... Let me think how to better fix this...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to