Apache9 commented on PR #4762:
URL: https://github.com/apache/hbase/pull/4762#issuecomment-1243943159
OK, we have this check in code
```
if (!this.isSourceActive()) {
setSourceStartupStatus(false);
if (Thread.currentThread().isInterrupted()) {
// If source is not running and thread is interrupted this means
someone has tried to
// remove this peer.
return;
}
retryStartup.set(!this.abortOnError);
throw new IllegalStateException("Source should be active.");
}
```
```
2022-09-12T23:00:16,183 WARN
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,1-zhangduo-virtualbox,34201,1662994782531]
replication.HBaseReplicationEndpoint(206): 1 Failed to create connection for
peer cluster
java.io.InterruptedIOException: null
at
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:184)
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
~[classes/:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:555)
~[classes/:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
Caused by: java.lang.InterruptedException
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
~[?:1.8.0_292]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
~[?:1.8.0_292]
at
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182)
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
... 10 more
2022-09-12T23:00:16,184 WARN
[RS_REFRESH_PEER-regionserver/zhangduo-VirtualBox:0-0.replicationSource,1-zhangduo-virtualbox,34201,1662994782531]
regionserver.ReplicationSource(559): peerId=1, Error starting
ReplicationEndpoint, retry
java.lang.IllegalStateException: Expected the service
HBaseInterClusterReplicationEndpoint [FAILED] to be RUNNING, but the service
has FAILED
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.checkCurrentState(AbstractService.java:381)
~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.awaitRunning(AbstractService.java:321)
~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:328)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:555)
~[classes/:?]
at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_292]
Caused by: java.io.InterruptedIOException
at
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:184)
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
~[classes/:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
~[classes/:?]
... 2 more
Caused by: java.lang.InterruptedException
at
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:347)
~[?:1.8.0_292]
at
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
~[?:1.8.0_292]
at
org.apache.hadoop.hbase.util.FutureUtils.get(FutureUtils.java:182)
~[hbase-common-3.0.0-alpha-4-SNAPSHOT.jar:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:43)
~[classes/:?]
at
org.apache.hadoop.hbase.client.ClusterConnectionFactory.createAsyncClusterConnection(ClusterConnectionFactory.java:67)
~[classes/:?]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.createConnection(HBaseReplicationEndpoint.java:95)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.connectPeerCluster(HBaseReplicationEndpoint.java:204)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.doStart(HBaseReplicationEndpoint.java:157)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hbase.thirdparty.com.google.common.util.concurrent.AbstractService.startAsync(AbstractService.java:251)
~[hbase-shaded-miscellaneous-4.1.1.jar:4.1.1]
at
org.apache.hadoop.hbase.replication.HBaseReplicationEndpoint.start(HBaseReplicationEndpoint.java:145)
~[classes/:3.0.0-alpha-4-SNAPSHOT]
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initAndStartReplicationEndpoint(ReplicationSource.java:327)
~[classes/:?]
... 2 more
```
But seems it is not enough, as we will wrap the InterruptedException with
InterruptedIOException in FutureUtils.get, but the upper layer just warn it,
without restoring the interrupted state...
For me, I think we should avoid using Thread.isInterrupted to test whether
we should abort a region server... Let me think how to better fix this...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]