[
https://issues.apache.org/jira/browse/HBASE-23282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035800#comment-17035800
]
Fabrice Rabaute commented on HBASE-23282:
-----------------------------------------
Hi,
I'm having an issue where I have a reported "Unkonwn Server", I upgraded from
2.2.1 to 2.2.3.
But I still get this Unknown Server even after running a SCP.
My region info is as follow:
{code:java}
COLUMN CELL
...
info:server timestamp=1581391440980,
value=regionserver-2.hbase.hbase.svc.cluster.local:16020
info:serverstartcode timestamp=1581391440980, value=1573519312100
info:sn timestamp=1581549272576,
value=regionserver-0.hbase.hbase.svc.cluster.local,16020,1581546727391
info:state timestamp=1581549272576, value=OPENING
....
{code}
I don't know what server/serverstartcode/sn mean, but they don't seem to match,
startcode are different. Is that expected?
In the HBCK UI, I have this info for the "Inconsistent Regions" reported:
{code:java}
encoded region: 353ab75c788cd0f77027706900453c49
location in META:
regionserver-2.hbase.hbase.svc.cluster.local,16020,1581546563369
{code}
I have this info for the "Unknown Servers" reported:
{code:java}
RegionInfo: 353ab75c788cd0f77027706900453c49
ServerName: regionserver-2.hbase.hbase.svc.cluster.local,16020,1573519312100
{code}
It means that I have 3 regionservers reported for this region based on the data.
Is there a automated or manual procedure to recover from such a state?
Thanks.
> HBCKServerCrashProcedure for 'Unknown Servers'
> ----------------------------------------------
>
> Key: HBASE-23282
> URL: https://issues.apache.org/jira/browse/HBASE-23282
> Project: HBase
> Issue Type: Bug
> Components: hbck2, proc-v2
> Affects Versions: 2.2.2
> Reporter: Michael Stack
> Assignee: Michael Stack
> Priority: Major
> Fix For: 3.0.0, 2.3.0, 2.2.3
>
>
> With an overdriving, sustained load, I can fairly easily manufacture an
> hbase:meta table that references servers that are no longer in the live list
> nor are members of deadservers; i.e. 'Unknown Servers'. The new 'HBCK
> Report' UI in Master has a section where it lists 'Unknown Servers' if any in
> hbase:meta.
> Once in this state, the repair is awkward. Our assign/unassign Procedure is
> particularly dogged about insisting that we confirm close/open of Regions
> when it is going about its business which is well and good if server is in
> live/dead sets but when an 'Unknown Server', we invariably end up trying to
> confirm against a non-longer present server (More on this in follow-on
> issues).
> What is wanted is queuing of a ServerCrashProcedure for each 'Unknown
> Server'. It would split any WALs (there shouldn't be any if server was
> restarted) and ideally it would cancel out any assigns and reassign regions
> off the 'Unknown Server'. But the 'normal' SCP consults the in-memory
> cluster state figuring what Regions were on the crashed server... And
> 'Unknown Servers' don't have state in in-master memory Maps of Servers to
> Regions or in DeadServers list which works fine for the usual case.
> Suggestion here is that hbck2 be able to drive in a special SCP, one which
> would get list of Regions by scanning hbase:meta rather than asking Master
> memory; an HBCKSCP.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)