[
https://issues.apache.org/jira/browse/HBASE-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17340521#comment-17340521
]
Duo Zhang commented on HBASE-25774:
-----------------------------------
Skimmed the code, I do not think it is easy to fix as we call isServerOnline in
many places, especially in RSProcedureDispatcher, we will give up if
isServerOnline returns, which means we assume that we will only send procedures
to online servers.
So now I prefer we just revert HBASE-25032, and use another way to not assign
regions to regionservers which are not fully initialized yet.
Thanks.
> ServerManager.getOnlineServer may miss some region servers when refreshing
> state in some procedure implementations
> ------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-25774
> URL: https://issues.apache.org/jira/browse/HBASE-25774
> Project: HBase
> Issue Type: Bug
> Components: Replication
> Reporter: Xiaolin Ha
> Assignee: Duo Zhang
> Priority: Critical
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3025/9/testReport/org.apache.hadoop.hbase.replication/TestSyncReplicationStandbyKillRS/precommit_checks___yetus_jdk8_Hadoop3_checks______/]
> {code:java}
> ...[truncated 391170 chars]...
> 76d634:45149.replicationSource,1] regionserver.HRegionServer(2351): STOPPED:
> Unexpected exception in RS:2;ece3af76d634:45149.replicationSource,1
> 2021-04-11T11:14:40,268 INFO [RS:2;ece3af76d634:45149]
> regionserver.HeapMemoryManager(218): Stopping
> 2021-04-11T11:14:40,268 INFO [MemStoreFlusher.0]
> regionserver.MemStoreFlusher$FlushHandler(384): MemStoreFlusher.0 exiting
> 2021-04-11T11:14:40,268 INFO [RS:2;ece3af76d634:45149]
> flush.RegionServerFlushTableProcedureManager(118): Stopping region server
> flush procedure manager abruptly.
> 2021-04-11T11:14:40,270 INFO [RS:2;ece3af76d634:45149]
> snapshot.RegionServerSnapshotManager(136): Stopping
> RegionServerSnapshotManager abruptly.
> 2021-04-11T11:14:40,270 INFO [RS:2;ece3af76d634:45149]
> regionserver.HRegionServer(1146): aborting server
> ece3af76d634,45149,1618139661734
> 2021-04-11T11:14:40,272 ERROR
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245]
> regionserver.ReplicationSource(428): Unexpected exception in
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245
> currentPath=null
> java.lang.IllegalStateException: Source should be active.
> at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:547)
> ~[classes/:?]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-04-11T11:14:40,272 DEBUG
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245]
> regionserver.HRegionServer(2576): Abort already in progress. Ignoring the
> current request with reason: Unexpected exception in
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245
> {code}
> Maybe it should use HBASE-24877 to avoid failure of the initialize of
> ReplicationSource.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)