Andrew Purtell created HBASE-20018:
--------------------------------------

             Summary: Safe online META repair
                 Key: HBASE-20018
                 URL: https://issues.apache.org/jira/browse/HBASE-20018
             Project: HBase
          Issue Type: New Feature
          Components: hbck
            Reporter: Andrew Purtell


HBCK is a tank, or a giant shotgun, or choose the battlefield metaphor you feel 
is most appropriate. It rolls onto the field and leaves problems crushed in its 
wake, but if you point it in the wrong direction, it will also crush your 
production data too. As such it is a means of last resort to fix an ailing 
cluster. It is also imperative that user request traffic, writes in particular, 
are stopped before attempting a number of the fixes. It is unlikely the default 
"-repair" option is what you want - this turns on too many fixes to risk at one 
time. There are a large number of command line switches for individual checks 
and fixes which are very useful but also error prone when cobbling together a 
command line for a cluster fix under pressure. An operations team might 
hesitate to employ hbck to fix some accumulating bad state, because of the 
disruption use of it requires, and the risk of compounding the problem if not 
carefully done. That of course would be bad because the accumulating bad state 
will eventually have an availability impact. 

It should be safer to use hbck, but changing hbck also carries risk. We can 
leave it be as the useful (but dangerous) tool it is and focus on a subset of 
its functionality to make safer.

There are a class of META corruptions of mild to moderate severity which could 
in theory be handled more safely in an online manner without requiring a 
suspension of user traffic. Some things hbck does are safe enough to use 
directly for this. Others need tweaks to do more preflight checks (like 
checking region states) first. Develop these as a separate tool, maybe even a 
new HMaster or Admin component.

Look for opportunities to share code with existing hbck, via refactor into a 
shared library. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to