[ 
https://issues.apache.org/jira/browse/HBASE-21444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680700#comment-16680700
 ] 

Duo Zhang commented on HBASE-21444:
-----------------------------------

I lost all the procedures yesterday on a testing cluster, after a master 
restarts, and I'm still digging now...

My concern here is that, once this happens, we do not know what's the real 
problem, so any automatic operations are dangerous. Recently we have been in 
trouble when using HBCK, the tool seems not be able to deal with split regions 
and messed up everything and thanks god we have replication enabled so we can 
copy data back from another cluster. Can you image that we do the HBCK 
operations automatically when there are unexpected things happen and then mess 
up everything? The users will kill us...

We can only write code to fix known bugs, cover known corner cases, and provide 
mechanism and tools to recover from unknown bugs, but we can not code for 
unknown bugs directly. That's always my point.

> Recover meta in case of long ago dead region server appear in meta znode
> ------------------------------------------------------------------------
>
>                 Key: HBASE-21444
>                 URL: https://issues.apache.org/jira/browse/HBASE-21444
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.2
>            Reporter: Ankit Singhal
>            Assignee: Ankit Singhal
>            Priority: Major
>         Attachments: HBASE-21444.branch-2.0.001.patch, 
> HBASE-21444.branch-2.0.002.patch
>
>
> Ambari metric server uses HBase as storage and currently have different 
> znodes (/hbase-unsecure and /hbase-secure) to differentiate secure/unsecure 
> deployment of HBase.  
> As it also supports the rollback of the cluster from kerberised to 
> non-kerberised (includes step of changing znode from /hbase-secure to 
> /hbase-unsecure) , but with HBase 2.0 , meta-region-server znode from old 
> zookeeper znodes will have regionserver which was long ago gone and there 
> will be no procedure to transition it, resulting it to get stuck for lifetime.
> One option is to clear the znodes before rollingback but as it used to work 
> with prior releases due to RecoverMetaProcedure, the ask is if we can fix 
> meta assignment in case the wrong state is available in znode.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to