[
https://issues.apache.org/jira/browse/HBASE-20671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16584709#comment-16584709
]
Tak Lon (Stephen) Wu commented on HBASE-20671:
----------------------------------------------
hi guys, I am not 100% sure yet but I recently worked on using
{{hbase.readonly}} to be true on hbase-2.1.0 for a read replica cluster that
the {{hbase:namespace}} cannot be assigned (infinite loop when
{{isTableAssigned}} is checking for {{hbase:namespace}} table but return false)
during the read replica cluster startup.
I found the patch of HBASE-20702 has skipped `empty` rows but seems like rows
for system table(s) e.g. {{hbase:namespace}} should not be considered as empty.
I made my band-aid change below and the cluster resumed to be started.
{noformat}
private void loadMeta() throws IOException {
// TODO: use a thread pool
regionStateStore.visitMeta(new RegionStateStore.RegionStateVisitor() {
@Override
public void visitRegionState(Result result, final RegionInfo regionInfo,
final State state,
final ServerName regionLocation, final ServerName lastHost, final
long openSeqNum) {
if (!regionInfo.getTable().equals(TableName.NAMESPACE_TABLE_NAME)) { //
<-- added to unblock the read replica cluster
if (state == null && regionLocation == null && lastHost == null
&& openSeqNum == SequenceId.NO_SEQUENCE_ID) {
// This is a row with nothing in it.
LOG.warn("Skipping empty row={}", result);
return;
}
}
{noformat}
so, do you guys think I should fix it in other place?
> Merged region brought back to life causing RS to be killed by Master
> --------------------------------------------------------------------
>
> Key: HBASE-20671
> URL: https://issues.apache.org/jira/browse/HBASE-20671
> Project: HBase
> Issue Type: Bug
> Components: amv2
> Affects Versions: 2.0.0
> Reporter: Josh Elser
> Assignee: Josh Elser
> Priority: Major
> Attachments: 0001-Test-for-HBASE-20671.patch,
> hbase-hbase-master-ctr-e138-1518143905142-336066-01-000003.hwx.site.log.zip,
> hbase-hbase-regionserver-ctr-e138-1518143905142-336066-01-000002.hwx.site.log.zip,
> workaround.txt
>
>
> Another bug coming out of a master restart and replay of the pv2 logs.
> The master merged two regions into one successfully, was restarted, but then
> ended up assigning the children region back out to the cluster. There is a
> log message which appears to indicate that RegionStates acknowledges that it
> doesn't know what this region is as it's replaying the pv2 WAL; however, it
> incorrectly assumes that the region is just OFFLINE and needs to be assigned.
> {noformat}
> 2018-05-30 04:26:00,055 INFO
> [RpcServer.default.FPBQ.Fifo.handler=29,queue=2,port=20000] master.HMaster:
> Client=hrt_qa//172.27.85.11 Merge regions a7dd6606dcacc9daf085fc9fa2aecc0c
> and 4017a3c778551d4d258c785d455f9c0b
> 2018-05-30 04:28:27,525 DEBUG
> [master/ctr-e138-1518143905142-336066-01-000003:20000]
> procedure2.ProcedureExecutor: Completed pid=4368, state=SUCCESS;
> MergeTableRegionsProcedure table=tabletwo_merge,
> regions=[a7dd6606dcacc9daf085fc9fa2aecc0c, 4017a3c778551d4d258c785d455f9c0b],
> forcibly=false
> {noformat}
> {noformat}
> 2018-05-30 04:29:20,263 INFO
> [master/ctr-e138-1518143905142-336066-01-000003:20000]
> assignment.AssignmentManager: a7dd6606dcacc9daf085fc9fa2aecc0c
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,263 INFO
> [master/ctr-e138-1518143905142-336066-01-000003:20000]
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!!
> rit=OFFLINE, location=null, table=tabletwo_merge,
> region=a7dd6606dcacc9daf085fc9fa2aecc0c
> 2018-05-30 04:29:20,266 INFO
> [master/ctr-e138-1518143905142-336066-01-000003:20000]
> assignment.AssignmentManager: 4017a3c778551d4d258c785d455f9c0b
> regionState=null; presuming OFFLINE
> 2018-05-30 04:29:20,266 INFO
> [master/ctr-e138-1518143905142-336066-01-000003:20000]
> assignment.RegionStates: Added to offline, CURRENTLY NEVER CLEARED!!!
> rit=OFFLINE, location=null, table=tabletwo_merge,
> region=4017a3c778551d4d258c785d455f9c0b
> {noformat}
> Eventually, the RS reports in its online regions, and the master tells it to
> kill itself:
> {noformat}
> 2018-05-30 04:29:24,272 WARN
> [RpcServer.default.FPBQ.Fifo.handler=26,queue=2,port=20000]
> assignment.AssignmentManager: Killing
> ctr-e138-1518143905142-336066-01-000002.hwx.site,16020,1527654546619: Not
> online: tabletwo_merge,,1527652130538.a7dd6606dcacc9daf085fc9fa2aecc0c.
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)