[
https://issues.apache.org/jira/browse/HBASE-24183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083799#comment-17083799
]
Huaxiang Sun commented on HBASE-24183:
--------------------------------------
Forward the root cause analysis from the github PR.
There are two flakies.
testAddToSerialPeer failure is that it just needs to make sure the source RS's
inmemory map contains only the new wal file. (not the RS which region moves to)
After that, there is still one failure which is common to testAddToSerialPeer
and testChangeToSerial.
If the old wal file before rollover is still in the inmemory map of
replicateSourceManager, during peer disable/enable/config update, it could be
still be replicated over from the begin to peer cluster. If that happens, the
old wal entries and the new wal entries will be written to the same wal file
(which results in out of order seq numbers).
waitUntilReplicatedToTheCurrentWALFile() does not really guarantee that the
inmemory map is forwarded to the new wal file, there is a small window that
there is only one wal file in map which is the old wal file. Added a new check
to make sure that inmemory map only has the new wal file.
This could happen in the production cluster as well, however I do not think
this is the purpose of these two test cases.
> [flakey test] replication.TestAddToSerialReplicationPeer
> --------------------------------------------------------
>
> Key: HBASE-24183
> URL: https://issues.apache.org/jira/browse/HBASE-24183
> Project: HBase
> Issue Type: Test
> Components: Client
> Affects Versions: 3.0.0, 2.3.0, 2.4.0
> Reporter: Huaxiang Sun
> Assignee: Hua Xiang
> Priority: Major
> Fix For: 3.0.0, 2.3.0
>
>
> From both 2.3 and branch-2 flaky test board, it constantly runs into the
> following flaky:
>
> {code:java}
> org.apache.hadoop.hbase.replication.TestAddToSerialReplicationPeer.testAddToSerialPeerFailing
> for the past 1 build (Since #6069 )Took 15 sec.Error MessageSequence id go
> backwards from 122 to 24Stacktracejava.lang.AssertionError: Sequence id go
> backwards from 122 to 24
> at
> org.apache.hadoop.hbase.replication.TestAddToSerialReplicationPeer.testAddToSerialPeer(TestAddToSerialReplicationPeer.java:176)
> Standard Output{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)