Andrew Purtell created HBASE-20018:
--------------------------------------
Summary: Safe online META repair
Key: HBASE-20018
URL: https://issues.apache.org/jira/browse/HBASE-20018
Project: HBase
Issue Type: New Feature
Components: hbck
Reporter: Andrew Purtell
HBCK is a tank, or a giant shotgun, or choose the battlefield metaphor you feel
is most appropriate. It rolls onto the field and leaves problems crushed in its
wake, but if you point it in the wrong direction, it will also crush your
production data too. As such it is a means of last resort to fix an ailing
cluster. It is also imperative that user request traffic, writes in particular,
are stopped before attempting a number of the fixes. It is unlikely the default
"-repair" option is what you want - this turns on too many fixes to risk at one
time. There are a large number of command line switches for individual checks
and fixes which are very useful but also error prone when cobbling together a
command line for a cluster fix under pressure. An operations team might
hesitate to employ hbck to fix some accumulating bad state, because of the
disruption use of it requires, and the risk of compounding the problem if not
carefully done. That of course would be bad because the accumulating bad state
will eventually have an availability impact.
It should be safer to use hbck, but changing hbck also carries risk. We can
leave it be as the useful (but dangerous) tool it is and focus on a subset of
its functionality to make safer.
There are a class of META corruptions of mild to moderate severity which could
in theory be handled more safely in an online manner without requiring a
suspension of user traffic. Some things hbck does are safe enough to use
directly for this. Others need tweaks to do more preflight checks (like
checking region states) first. Develop these as a separate tool, maybe even a
new HMaster or Admin component.
Look for opportunities to share code with existing hbck, via refactor into a
shared library.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)