[ 
https://issues.apache.org/jira/browse/HBASE-19121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16663223#comment-16663223
 ] 

stack commented on HBASE-19121:
-------------------------------

bq. These regions are recorded as OPEN on that crashed regionserver in META, 
that's why I need a tool to help find all these regions that are OPEN in META 
but actually not alive anymore. 

Smile. Its first item in my list of things we need here: 
https://docs.google.com/document/d/1Y0HIo5yRGXi7nl-JWc69JtxB87fYE-jXe8nBe7HWKe0/edit#heading=h.awq9l5odz77e

I was using the Canary to find these. I'd then unassign the Region -- which 
triggers an SCP for this deadserver -- and then after, I'd do a re-assign. 
Usually this runs smoothly unless another has lock on the Region entity whether 
directly or on the containing Table.

A tool to scan hbase:meta looking for servers that are not online, are not 
deadservers, might be good. What would it do w/ the info? Queu'ing an SCP is 
not enough (IIRC) because we don't have the list of what regions were on that 
old dead server so when the SCP goes to do assigns, it'll have an empty queue. 
Doing an unassign on each of these Regions will trigger a sort of useless SCP 
-- unless we determine it a long dead and gone server -- though if WALs to be 
split, it'll split them. Otherwise, these SCPs will be noops mostly. I'd be 
interested in any thoughts you have here [~tianjingyun].

bq. Canary tool can help solve this problem. But it's a little bit slow since 
it needs to read a row from all these regions.

I don't mind it being 'slow'. It actually does this in parallel so can be 
pretty fast.

Adding some functionality to the Canary where it recognizes that the server is 
not online, is not in dead servers, and perhaps has no WALs on fs, might be the 
way to go? You'd add a flag for it to actually act on any Regions it found that 
were in the 'wrong' state? Its sort of built to do this sort of review of the 
cluster?

bq. Besides, do we still get a chance to met the problem that region OPEN on 
more than one regionserver?

We don't seem to have this problem any more. I believe its because the Master 
kills RegionServers that are in disagreement with what it thinks the state of 
affairs are. RegionServers report the Regions they are hosting on each 
heartbeat. Would have to check....

Thanks.


> HBCK for AMv2 (A.K.A HBCK2)
> ---------------------------
>
>                 Key: HBASE-19121
>                 URL: https://issues.apache.org/jira/browse/HBASE-19121
>             Project: HBase
>          Issue Type: Umbrella
>          Components: hbck, hbck2
>            Reporter: stack
>            Assignee: Umesh Agashe
>            Priority: Major
>             Fix For: hbck2-1.0.0
>
>         Attachments: hbase-19121.master.001.patch
>
>
> We don't have an hbck for the new AM. Old hbck may actually do damage going 
> against AMv2.
> Fix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to