[
https://issues.apache.org/jira/browse/HBASE-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453312#comment-16453312
]
Duo Zhang commented on HBASE-20426:
-----------------------------------
The error message is a bit strange
{noformat}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
at
org.apache.hadoop.hbase.replication.regionserver.TestSyncReplicationShipperQuit.testShipperQuitWhenDA(TestSyncReplicationShipperQuit.java:59)
{noformat}
It is this line
{code}
writeAndVerifyReplication(UTIL1, UTIL2, 0, 100);
{code}
And inside the call, only this line could produce this exception
{code}
HRegion region = util.getMiniHBaseCluster().getRegions(TABLE_NAME).get(0);
{code}
But it is not likely to be the root cause...
And I see this in the output
{noformat}
2018-04-25 18:09:37,915 ERROR
[regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1]
regionserver.ReplicationSource(331): Unexpected exception in
regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1
currentPath=hdfs://localhost:49088/user/jenkins/test-data/df896084-e847-4cbd-bc5c-1899d7d727bb/WALs/829678c9d487,43261,1524679749179/829678c9d487%2C43261%2C1524679749179.1524679772557
java.lang.NullPointerException
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.filterEntry(ReplicationSourceWALReader.java:283)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.readWALEntries(SerialReplicationSourceWALReader.java:93)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:136)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:34)
2018-04-25 18:09:37,919 ERROR
[regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1]
helpers.MarkerIgnoringBase(159): ***** ABORTING region server
829678c9d487,43261,1524679749179: Unexpected exception in
regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1
*****
java.lang.NullPointerException
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.filterEntry(ReplicationSourceWALReader.java:283)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.readWALEntries(SerialReplicationSourceWALReader.java:93)
at
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:136)
at
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:34)
{noformat}
Let me dig more.
> Give up replicating anything in S state
> ---------------------------------------
>
> Key: HBASE-20426
> URL: https://issues.apache.org/jira/browse/HBASE-20426
> Project: HBase
> Issue Type: Sub-task
> Components: Replication
> Reporter: Duo Zhang
> Assignee: Duo Zhang
> Priority: Major
> Fix For: HBASE-19064
>
> Attachments: HBASE-20426-HBASE-19064-v1.patch,
> HBASE-20426-HBASE-19064-v1.patch, HBASE-20426-HBASE-19064.patch,
> HBASE-20426-HBASE-19064.patch, HBASE-20426-HBASE-19064.patch,
> HBASE-20426-UT.patch
>
>
> When we transit the remote S cluster to DA, and then transit the old A
> cluster to S, it is possible that we still have some entries which have not
> been replicated yet for the old A cluster, and then the async replication
> will be blocked.
> And this may also lead to data inconsistency after we transit it to DA back
> later as these entries will be replicated again, but the new data which are
> replicated from the remote cluster will not be replicated back, which
> introduce a whole in the replication.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)