[ 
https://issues.apache.org/jira/browse/HBASE-28151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Jasani updated HBASE-28151:
---------------------------------
    Description: 
When operator uses hbck assigns or unassigns with "-o", the override will also 
skip pre transit checks. While this is one of the intentions with "-o", the 
primary purpose should still be to only unattach existing procedure from 
RegionStateNode so that newly scheduled assign proc can take exclusive region 
level lock.

We should restrict bypassing preTransitCheck by only providing it as site 
config.

If bypassing preTransitCheck is configured, only then any hbck "-o" should be 
allowed to bypass this check, otherwise by default they should go through the 
check.

 

It is important to keep "unset of the procedure from RegionStateNode" and 
"bypassing preTransitCheck" separate so that when the cluster state is bad, we 
don't explicitly deteriorate it further e.g. if a region was successfully split 
and now if operator performs "hbck assigns \{region} -o" and if it bypasses the 
transit check, master would bring the region online and it could compact store 
files and archive the store file which is referenced by daughter region. This 
would not allow daughter region to come online.

Let's introduce hbase site config to allow bypassing preTransitCheck, it should 
not be doable only by operator using hbck alone.

 

"-o" should mean "override" the procedure that is attached to the 
RegionStateNode, it should not mean forcefully skip any region transition 
validation checks and perform the region assignments.

  was:
When operator uses hbck assigns or unassigns with "-o", the override will also 
skip pre transit checks. While this is one of the intentions with "-o", the 
primary purpose should still be to only unattach existing procedure from 
RegionStateNode so that newly scheduled assign proc can take exclusive region 
level lock.

We should restrict bypassing preTransitCheck by only providing it as site 
config.

If bypassing preTransitCheck is configured, only then any hbck "-o" should be 
allowed to bypass this check, otherwise by default they should go through the 
check.

 

It is important to keep "unset of the procedure from RegionStateNode" and 
"bypassing preTransitCheck" separate so that when the cluster state is bad, we 
don't explicitly deteriorate it further e.g. if a region was successfully split 
and now if operator performs "hbck assigns \{region} -o" and if it bypasses the 
transit check, master would bring the region online and it could compact store 
files and archive the store file which is referenced by daughter region. This 
would not allow daughter region to come online.

Let's introduce hbase site config to allow bypassing preTransitCheck, it should 
not be doable only by operator using hbck alone.


> hbck -o should not allow bypassing pre transit check by default
> ---------------------------------------------------------------
>
>                 Key: HBASE-28151
>                 URL: https://issues.apache.org/jira/browse/HBASE-28151
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 2.4.17, 2.5.5
>            Reporter: Viraj Jasani
>            Priority: Major
>
> When operator uses hbck assigns or unassigns with "-o", the override will 
> also skip pre transit checks. While this is one of the intentions with "-o", 
> the primary purpose should still be to only unattach existing procedure from 
> RegionStateNode so that newly scheduled assign proc can take exclusive region 
> level lock.
> We should restrict bypassing preTransitCheck by only providing it as site 
> config.
> If bypassing preTransitCheck is configured, only then any hbck "-o" should be 
> allowed to bypass this check, otherwise by default they should go through the 
> check.
>  
> It is important to keep "unset of the procedure from RegionStateNode" and 
> "bypassing preTransitCheck" separate so that when the cluster state is bad, 
> we don't explicitly deteriorate it further e.g. if a region was successfully 
> split and now if operator performs "hbck assigns \{region} -o" and if it 
> bypasses the transit check, master would bring the region online and it could 
> compact store files and archive the store file which is referenced by 
> daughter region. This would not allow daughter region to come online.
> Let's introduce hbase site config to allow bypassing preTransitCheck, it 
> should not be doable only by operator using hbck alone.
>  
> "-o" should mean "override" the procedure that is attached to the 
> RegionStateNode, it should not mean forcefully skip any region transition 
> validation checks and perform the region assignments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to