[ 
https://issues.apache.org/jira/browse/HBASE-20018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell resolved HBASE-20018.
-----------------------------------------
    Resolution: Not A Problem

Some of these ideas were implemented in HbckChore in HBase v2 and the remainder 
of this issue is old.

> Safe online META repair
> -----------------------
>
>                 Key: HBASE-20018
>                 URL: https://issues.apache.org/jira/browse/HBASE-20018
>             Project: HBase
>          Issue Type: New Feature
>          Components: hbck
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> HBCK is a tank, or a giant shotgun, or choose the battlefield metaphor you 
> feel is most appropriate. It rolls onto the field and leaves problems crushed 
> in its wake, but if you point it in the wrong direction, it will also crush 
> your production data too. As such it is a means of last resort to fix an 
> ailing cluster. It is also imperative that user request traffic, writes in 
> particular, are stopped before attempting a number of the fixes. It is 
> unlikely the default "-repair" option is what you want - this turns on too 
> many fixes to risk at one time. There are a large number of command line 
> switches for individual checks and fixes which are very useful but also error 
> prone when cobbling together a command line for a cluster fix under pressure. 
> An operations team might hesitate to employ hbck to fix some accumulating bad 
> state, because of the disruption use of it requires, and the risk of 
> compounding the problem if not carefully done. That of course would be bad 
> because the accumulating bad state will eventually have an availability 
> impact. 
> It should be safer to use hbck, but changing hbck also carries risk. We can 
> leave it be as the useful (but dangerous) tool it is and focus on a subset of 
> its functionality to make safer.
> There are a class of META corruptions of mild to moderate severity which 
> could in theory be handled more safely in an online manner without requiring 
> a suspension of user traffic. Some things hbck does are safe enough to use 
> directly for this. Others need tweaks to do more preflight checks (like 
> checking region states) first. Develop these as a separate tool, maybe even a 
> new HMaster or Admin component.
> Look for opportunities to share code with existing hbck, via refactor into a 
> shared library. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to