[
https://issues.apache.org/jira/browse/HBASE-21156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-21156:
--------------------------
Description:
We need this to effect repair when damage.
If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during
partial split (unlikely scenarios but nonetheless possible), a Master can be
stuck unable to become active because there is no assign procedure for
hbase:meta in the system.
The reasonable argument over in HBASE-21035 has it that attempts at auto-repair
under these extremes could cause other issues so at least until we learn more,
we for now punt to the operator for fix-up.
To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s test).
UPDATE:
was:
We need this to effect repair when damage.
If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during
partial split (unlikely scenarios but nonetheless possible), a Master can be
stuck unable to become active because there is no assign procedure for
hbase:meta in the system.
The reasonable argument over in HBASE-21035 has it that attempts at auto-repair
under these extremes could cause other issues so at least until we learn more,
we for now punt to the operator for fix-up.
To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s test).
> [hbck2] Queue an assign of hbase:meta and bulk assign/unassign
> --------------------------------------------------------------
>
> Key: HBASE-21156
> URL: https://issues.apache.org/jira/browse/HBASE-21156
> Project: HBase
> Issue Type: Sub-task
> Components: hbck2
> Affects Versions: 2.1.0
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 2.1.1
>
>
> We need this to effect repair when damage.
> If procedure WALs AND a server WAL dir are lost or cleaned or we crashed
> during partial split (unlikely scenarios but nonetheless possible), a Master
> can be stuck unable to become active because there is no assign procedure for
> hbase:meta in the system.
> The reasonable argument over in HBASE-21035 has it that attempts at
> auto-repair under these extremes could cause other issues so at least until
> we learn more, we for now punt to the operator for fix-up.
> To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s
> test).
> UPDATE:
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)