[ 
https://issues.apache.org/jira/browse/HBASE-15406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15314814#comment-15314814
 ] 

Enis Soztutar commented on HBASE-15406:
---------------------------------------

I've looked at this again, especially related to disabling catalog janitor from 
HBCK in HBASE-15940. The patch as it is only handles split / merge switch and 
not balancer (which is also disabled in master). I think we should disable 
catalog janitor as well. But I think we should simplify this patch before 1.3 
is released since it is too complex to understand what is going on. The 
switches have 3 states? We call it a "lock", but save state there and switch 
back the state? Sorry but this is way too complex to be released I think. I 
thought the plan was to use ephemeral node to track active HBCK, but the final 
patch ended up doing something else. 

The problem we are trying to solve is that during HBCK runs or some other 
"admin" operations, we should not have balancer, catalog janitor and 
split/merge running. The problem is that HBCK run is not tracked from the 
master, so that if we disable these switches, they can be left disabled if HBCK 
run is aborted. 

Can we revert this patch and solve the root cause of the problem instead of 
adding all of this complexity. I propose we add a "Maintenance Mode" in master 
similar to the region split / merge, balancer and other switches. The 
maintenance mode will effectively put all other switches in disabled mode. When 
admin / HBCK puts the master in maintenance mode, she can optionally supply an 
ephemeral znode path that the master will watch. As soon as all ephemeral nodes 
goes away, master will go out of maintenance mode. Every instance of HBCK 
creates an ephemeral znode, so that even more than one instance is running, 
there won't be issues if one finishes, while the others are going. wdyt? 

> Split / merge switch left disabled after early termination of hbck
> ------------------------------------------------------------------
>
>                 Key: HBASE-15406
>                 URL: https://issues.apache.org/jira/browse/HBASE-15406
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>            Assignee: Heng Chen
>            Priority: Critical
>              Labels: reviewed
>             Fix For: 2.0.0, 1.3.0, 1.4.0
>
>         Attachments: HBASE-15406.patch, HBASE-15406.v1.patch, 
> HBASE-15406_v1.patch, HBASE-15406_v2.patch, test.patch, wip.patch
>
>
> This was what I did on cluster with 1.4.0-SNAPSHOT built Thursday:
> Run 'hbase hbck -disableSplitAndMerge' on gateway node of the cluster
> Terminate hbck early
> Enter hbase shell where I observed:
> {code}
> hbase(main):001:0> splitormerge_enabled 'SPLIT'
> false
> 0 row(s) in 0.3280 seconds
> hbase(main):002:0> splitormerge_enabled 'MERGE'
> false
> 0 row(s) in 0.0070 seconds
> {code}
> Expectation is that the split / merge switches should be restored to default 
> value after hbck exits.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to