[
https://issues.apache.org/jira/browse/HBASE-26797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17501016#comment-17501016
]
Bryan Beaudreault commented on HBASE-26797:
-------------------------------------------
I wrote an acceptance test internally which points our hbase 1 client against
one of our hbase 2 clusters and sets things up so that there would be an
orphaned rep_barrier row. The test proves that this change fixes the issue.
Unfortunately I can't share the code, but it effectively does the following:
* Create a table with a single CF
* Add REPLICATION_SCOPE = 1 to CF, which will result in rep_barrier rows
* Split the table twice (resulting in 3 regions, with split points '2222' and
'5555')
* run catalog janitor, which cleans up the parent records leaving just the
orphaned rep_barrier rows
* merge the first 2 regions, so now there are just 2 regions with split point
'5555'
* run catalog janitor again, further cleaning up old records
* do a RegionLocator.getRegionLocation('2222', true)
** This fails pre-patch, but succeeds post-patch
> HBase 1.x clients will choke on rep_barrier rows when scanning hbase 2.x meta
> -----------------------------------------------------------------------------
>
> Key: HBASE-26797
> URL: https://issues.apache.org/jira/browse/HBASE-26797
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.7.1
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Major
> Labels: patch-available
>
> In hbase 2.x, support for serial replication included adding a new CF to meta
> called rep_barrier. When regions are split or merged, these rep_barrier rows
> will not be cleaned up. Instead there's a ReplicationBarrierCleaner chore
> which runs every 12 hours. HBase 2.x clients will ignore these rep_barrier
> rows, per the [addFamily call in
> locateRegionInMeta|[https://github.com/apache/hbase/blob/branch-2/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionImplementation.java#L929].]
> Encountering these orphan rep_barrier rows causes the hbase 1.x client to
> fail when it [tries to extract the region location from the meta
> row|[https://github.com/apache/hbase/blob/branch-1/hbase-client/src/main/java/org/apache/hadoop/hbase/client/ConnectionManager.java#L1340-L1344].]
> This is a non-recoverable exception, so retries will fail and it will
> eventually bubble up.
> The immediate fix when encountering this is to run {{{}hbck2 fixMeta{}}}, but
> we should fix the hbase 1.x client to similarly filter on the CATALOG_FAMILY
> to avoid these issues altogether.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)