[jira] [Commented] (HBASE-20426) Give up replicating anything in S state

Duo Zhang (JIRA) Wed, 25 Apr 2018 18:06:32 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-20426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16453312#comment-16453312
 ]


Duo Zhang commented on HBASE-20426:
-----------------------------------

The error message is a bit strange

{noformat}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0
        at 
org.apache.hadoop.hbase.replication.regionserver.TestSyncReplicationShipperQuit.testShipperQuitWhenDA(TestSyncReplicationShipperQuit.java:59)
{noformat}

It is this line
{code}
writeAndVerifyReplication(UTIL1, UTIL2, 0, 100);
{code}

And inside the call, only this line could produce this exception
{code}
HRegion region = util.getMiniHBaseCluster().getRegions(TABLE_NAME).get(0);
{code}

But it is not likely to be the root cause...

And I see this in the output

{noformat}
2018-04-25 18:09:37,915 ERROR 
[regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1]
 regionserver.ReplicationSource(331): Unexpected exception in 
regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1
 
currentPath=hdfs://localhost:49088/user/jenkins/test-data/df896084-e847-4cbd-bc5c-1899d7d727bb/WALs/829678c9d487,43261,1524679749179/829678c9d487%2C43261%2C1524679749179.1524679772557
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.filterEntry(ReplicationSourceWALReader.java:283)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.readWALEntries(SerialReplicationSourceWALReader.java:93)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:136)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:34)
2018-04-25 18:09:37,919 ERROR 
[regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1]
 helpers.MarkerIgnoringBase(159): ***** ABORTING region server 
829678c9d487,43261,1524679749179: Unexpected exception in 
regionserver/829678c9d487:0.logRoller.replicationSource.wal-reader.829678c9d487%2C43261%2C1524679749179,1
 *****
java.lang.NullPointerException
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.filterEntry(ReplicationSourceWALReader.java:283)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.readWALEntries(SerialReplicationSourceWALReader.java:93)
        at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceWALReader.run(ReplicationSourceWALReader.java:136)
        at 
org.apache.hadoop.hbase.replication.regionserver.SerialReplicationSourceWALReader.run(SerialReplicationSourceWALReader.java:34)
{noformat}

Let me dig more.

> Give up replicating anything in S state
> ---------------------------------------
>
>                 Key: HBASE-20426
>                 URL: https://issues.apache.org/jira/browse/HBASE-20426
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Replication
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>            Priority: Major
>             Fix For: HBASE-19064
>
>         Attachments: HBASE-20426-HBASE-19064-v1.patch, 
> HBASE-20426-HBASE-19064-v1.patch, HBASE-20426-HBASE-19064.patch, 
> HBASE-20426-HBASE-19064.patch, HBASE-20426-HBASE-19064.patch, 
> HBASE-20426-UT.patch
>
>
> When we transit the remote S cluster to DA, and then transit the old A 
> cluster to S, it is possible that we still have some entries which have not 
> been replicated yet for the old A cluster, and then the async replication 
> will be blocked.
> And this may also lead to data inconsistency after we transit it to DA back 
> later as these entries will be replicated again, but the new data which are 
> replicated from the remote cluster will not be replicated back, which 
> introduce a whole in the replication.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-20426) Give up replicating anything in S state

Reply via email to