[
https://issues.apache.org/jira/browse/HBASE-21444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16680700#comment-16680700
]
Duo Zhang commented on HBASE-21444:
-----------------------------------
I lost all the procedures yesterday on a testing cluster, after a master
restarts, and I'm still digging now...
My concern here is that, once this happens, we do not know what's the real
problem, so any automatic operations are dangerous. Recently we have been in
trouble when using HBCK, the tool seems not be able to deal with split regions
and messed up everything and thanks god we have replication enabled so we can
copy data back from another cluster. Can you image that we do the HBCK
operations automatically when there are unexpected things happen and then mess
up everything? The users will kill us...
We can only write code to fix known bugs, cover known corner cases, and provide
mechanism and tools to recover from unknown bugs, but we can not code for
unknown bugs directly. That's always my point.
> Recover meta in case of long ago dead region server appear in meta znode
> ------------------------------------------------------------------------
>
> Key: HBASE-21444
> URL: https://issues.apache.org/jira/browse/HBASE-21444
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.0.2
> Reporter: Ankit Singhal
> Assignee: Ankit Singhal
> Priority: Major
> Attachments: HBASE-21444.branch-2.0.001.patch,
> HBASE-21444.branch-2.0.002.patch
>
>
> Ambari metric server uses HBase as storage and currently have different
> znodes (/hbase-unsecure and /hbase-secure) to differentiate secure/unsecure
> deployment of HBase.
> As it also supports the rollback of the cluster from kerberised to
> non-kerberised (includes step of changing znode from /hbase-secure to
> /hbase-unsecure) , but with HBase 2.0 , meta-region-server znode from old
> zookeeper znodes will have regionserver which was long ago gone and there
> will be no procedure to transition it, resulting it to get stuck for lifetime.
> One option is to clear the znodes before rollingback but as it used to work
> with prior releases due to RecoverMetaProcedure, the ask is if we can fix
> meta assignment in case the wrong state is available in znode.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)