[
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663223#comment-16663223
]
stack commented on HBASE-19121:
-------------------------------
bq. These regions are recorded as OPEN on that crashed regionserver in META,
that's why I need a tool to help find all these regions that are OPEN in META
but actually not alive anymore.
Smile. Its first item in my list of things we need here:
https://docs.google.com/document/d/1Y0HIo5yRGXi7nl-JWc69JtxB87fYE-jXe8nBe7HWKe0/edit#heading=h.awq9l5odz77e
I was using the Canary to find these. I'd then unassign the Region -- which
triggers an SCP for this deadserver -- and then after, I'd do a re-assign.
Usually this runs smoothly unless another has lock on the Region entity whether
directly or on the containing Table.
A tool to scan hbase:meta looking for servers that are not online, are not
deadservers, might be good. What would it do w/ the info? Queu'ing an SCP is
not enough (IIRC) because we don't have the list of what regions were on that
old dead server so when the SCP goes to do assigns, it'll have an empty queue.
Doing an unassign on each of these Regions will trigger a sort of useless SCP
-- unless we determine it a long dead and gone server -- though if WALs to be
split, it'll split them. Otherwise, these SCPs will be noops mostly. I'd be
interested in any thoughts you have here [~tianjingyun].
bq. Canary tool can help solve this problem. But it's a little bit slow since
it needs to read a row from all these regions.
I don't mind it being 'slow'. It actually does this in parallel so can be
pretty fast.
Adding some functionality to the Canary where it recognizes that the server is
not online, is not in dead servers, and perhaps has no WALs on fs, might be the
way to go? You'd add a flag for it to actually act on any Regions it found that
were in the 'wrong' state? Its sort of built to do this sort of review of the
cluster?
bq. Besides, do we still get a chance to met the problem that region OPEN on
more than one regionserver?
We don't seem to have this problem any more. I believe its because the Master
kills RegionServers that are in disagreement with what it thinks the state of
affairs are. RegionServers report the Regions they are hosting on each
heartbeat. Would have to check....
Thanks.
> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
> Key: HBASE-19121
> URL: https://issues.apache.org/jira/browse/HBASE-19121
> Project: HBase
> Issue Type: Umbrella
> Components: hbck, hbck2
> Reporter: stack
> Assignee: Umesh Agashe
> Priority: Major
> Fix For: hbck2-1.0.0
>
> Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going
> against AMv2.
> Fix.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)