[ 
https://issues.apache.org/jira/browse/HBASE-25774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323945#comment-17323945
 ] 

Xiaolin Ha edited comment on HBASE-25774 at 4/16/21, 5:04 PM:
--------------------------------------------------------------

Yes, [~zhangduo]. There are no race conditions between 
ReplicationSourceManager#addSource and 
PeerProcedureHandlerImpl#refreshPeerState.

As a result, when RSes in UT restart, and is starting the replication service 
by `startReplicationService`, concurrently 
`transitReplicationPeerSyncReplicationState` (UT line 91) will make the 
initialize of a replication source terminate and throws Exception, then the RS 
will be aborted by this exception because it is in starting the replication 
service. 

Logs can be seen, 
{code:java}
76d634:45149.replicationSource,1] regionserver.HRegionServer(2351): STOPPED: 
Unexpected exception in RS:2;ece3af76d634:45149.replicationSource,1
{code}


was (Author: xiaolin ha):
Yes, [~zhangduo]. There are no race conditions between 
ReplicationSourceManager#addSource and 
PeerProcedureHandlerImpl#refreshPeerState.

As a result, when RSes in UT restart, and is starting the replication service 
by `startReplicationService`, concurrently 
`transitReplicationPeerSyncReplicationState` (UT line 91) will make the 
initialize of a replication source terminate.

> TestSyncReplicationStandbyKillRS#testStandbyKillRegionServer is flaky
> ---------------------------------------------------------------------
>
>                 Key: HBASE-25774
>                 URL: https://issues.apache.org/jira/browse/HBASE-25774
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Xiaolin Ha
>            Assignee: Xiaolin Ha
>            Priority: Major
>
> [https://ci-hadoop.apache.org/job/HBase/job/HBase-PreCommit-GitHub-PR/job/PR-3025/9/testReport/org.apache.hadoop.hbase.replication/TestSyncReplicationStandbyKillRS/precommit_checks___yetus_jdk8_Hadoop3_checks______/]
> {code:java}
> ...[truncated 391170 chars]...
> 76d634:45149.replicationSource,1] regionserver.HRegionServer(2351): STOPPED: 
> Unexpected exception in RS:2;ece3af76d634:45149.replicationSource,1
> 2021-04-11T11:14:40,268 INFO  [RS:2;ece3af76d634:45149] 
> regionserver.HeapMemoryManager(218): Stopping
> 2021-04-11T11:14:40,268 INFO  [MemStoreFlusher.0] 
> regionserver.MemStoreFlusher$FlushHandler(384): MemStoreFlusher.0 exiting
> 2021-04-11T11:14:40,268 INFO  [RS:2;ece3af76d634:45149] 
> flush.RegionServerFlushTableProcedureManager(118): Stopping region server 
> flush procedure manager abruptly.
> 2021-04-11T11:14:40,270 INFO  [RS:2;ece3af76d634:45149] 
> snapshot.RegionServerSnapshotManager(136): Stopping 
> RegionServerSnapshotManager abruptly.
> 2021-04-11T11:14:40,270 INFO  [RS:2;ece3af76d634:45149] 
> regionserver.HRegionServer(1146): aborting server 
> ece3af76d634,45149,1618139661734
> 2021-04-11T11:14:40,272 ERROR 
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245] 
> regionserver.ReplicationSource(428): Unexpected exception in 
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245 
> currentPath=null
> java.lang.IllegalStateException: Source should be active.
>       at 
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.initialize(ReplicationSource.java:547)
>  ~[classes/:?]
>       at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
> 2021-04-11T11:14:40,272 DEBUG 
> [ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245] 
> regionserver.HRegionServer(2576): Abort already in progress. Ignoring the 
> current request with reason: Unexpected exception in 
> ReplicationExecutor-0.replicationSource,1-ece3af76d634,44745,1618139625245
> {code}
> Maybe it should use HBASE-24877 to avoid failure of the initialize of 
> ReplicationSource.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to