[
https://issues.apache.org/jira/browse/HBASE-28151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776353#comment-17776353
]
Andrew Kyle Purtell edited comment on HBASE-28151 at 10/17/23 7:19 PM:
-----------------------------------------------------------------------
Let's not bring back hbck1 style complex arguments. That was part of the design
mistakes we made with hbck1.
Consider the hbck2 documentation
(https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/README.md#philosophy):
{quote}
HBCK2 performs a single, discrete task each time it is run. It does not presume
a tool can analyze all about the running cluster and then repair 'all problems'
found as hbck1 used suggest.
HBCK2 is for fixes.
{quote}
We can derive some requirements from that philosophy:
- Simplicity and predictability. Each command does a single, discrete task each
time it is run. Options are only added if absolutely necessary.
- Commands have clear and simple names.
- Command arguments have simple names. We can see from current implementation
they are all UNIX like. Maintain this naming philosophy. A short name, a single
character, and a long name. Consider this resource:
https://nullprogram.com/blog/2020/08/01/
If we want to optionally keep the preflight checks even when bypassing, provide
a simple argument like '-f' (force) in addition to '-o' (bypass). And when the
-f option is not provided, keep the preflight check.
The procedure framework and implementations will require updates to incorporate
the distinction between bypass with preflight checks and bypass without
preflight checks.
was (Author: apurtell):
Let's not bring back hbck1 style complex arguments. That was part of the design
mistakes we made with hbck1.
Consider the hbck2 documentation
(https://github.com/apache/hbase-operator-tools/blob/master/hbase-hbck2/README.md#philosophy):
{quote}
HBCK2 performs a single, discrete task each time it is run. It does not presume
a tool can analyze all about the running cluster and then repair 'all problems'
found as hbck1 used suggest.
HBCK2 is for fixes.
{quote}
Requirements:
- Simplicity and predictability. Each command does a single, discrete task each
time it is run. Options are only added if absolutely necessary.
- Commands have clear and simple names.
- Command arguments have simple names. We can see from current implementation
they are all UNIX like. Maintain this naming philosophy. A short name, a single
character, and a long name. Consider this resource:
https://nullprogram.com/blog/2020/08/01/
If we want to optionally keep the preflight checks even when bypassing, provide
a simple argument like '-f' (force) in addition to '-o' (bypass). And when the
-f option is not provided, keep the preflight check.
The procedure framework and implementations will require updates to incorporate
the distinction between bypass with preflight checks and bypass without
preflight checks.
> hbck -o should not allow bypassing pre transit check by default
> ---------------------------------------------------------------
>
> Key: HBASE-28151
> URL: https://issues.apache.org/jira/browse/HBASE-28151
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.4.17, 2.5.5
> Reporter: Viraj Jasani
> Priority: Major
>
> When operator uses hbck assigns or unassigns with "-o", the override will
> also skip pre transit checks. While this is one of the intentions with "-o",
> the primary purpose should still be to only unattach existing procedure from
> RegionStateNode so that newly scheduled assign proc can take exclusive region
> level lock.
> We should restrict bypassing preTransitCheck by only providing it as site
> config.
> If bypassing preTransitCheck is configured, only then any hbck "-o" should be
> allowed to bypass this check, otherwise by default they should go through the
> check.
>
> It is important to keep "unset of the procedure from RegionStateNode" and
> "bypassing preTransitCheck" separate so that when the cluster state is bad,
> we don't explicitly deteriorate it further e.g. if a region was successfully
> split and now if operator performs "hbck assigns \{region} -o" and if it
> bypasses the transit check, master would bring the region online and it could
> compact store files and archive the store file which is referenced by
> daughter region. This would not allow daughter region to come online.
> Let's introduce hbase site config to allow bypassing preTransitCheck, it
> should not be doable only by operator using hbck alone.
>
> "-o" should mean "override" the procedure that is attached to the
> RegionStateNode, it should not mean forcefully skip any region transition
> validation checks.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)