[ 
https://issues.apache.org/jira/browse/HBASE-21156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-21156:
--------------------------
    Description: 
We need this to effect repair when damage.

If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during 
partial split (unlikely scenarios but nonetheless possible), a Master can be 
stuck unable to become active because there is no assign procedure for 
hbase:meta in the system.

The reasonable argument over in HBASE-21035 has it that attempts at auto-repair 
under these extremes could cause other issues so at least until we learn more, 
we for now punt to the operator for fix-up.

To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s test).

UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup 
it does not have an assign for meta (possible if we lose all Master WAL 
Procs.). Holding pattern is needed because we were exiting after one minute of 
RPC'ing to old meta location. To inject an assign, the Admin#assign won't work 
because it gets rejected because the "Master is Initializing". So we need to be 
able to assign hbase:meta even if "Master is initializing". Also, while in 
here, add being able to bulk assign because assigning a Region-at-a-time from 
the shell only works if the offflined region count is in the low 10s; fails 
when thousands offline.

  was:
We need this to effect repair when damage.

If procedure WALs AND a server WAL dir are lost or cleaned or we crashed during 
partial split (unlikely scenarios but nonetheless possible), a Master can be 
stuck unable to become active because there is no assign procedure for 
hbase:meta in the system.

The reasonable argument over in HBASE-21035 has it that attempts at auto-repair 
under these extremes could cause other issues so at least until we learn more, 
we for now punt to the operator for fix-up.

To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s test).

UPDATE: 


> [hbck2] Queue an assign of hbase:meta and bulk assign/unassign
> --------------------------------------------------------------
>
>                 Key: HBASE-21156
>                 URL: https://issues.apache.org/jira/browse/HBASE-21156
>             Project: HBase
>          Issue Type: Sub-task
>          Components: hbck2
>    Affects Versions: 2.1.0
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 2.1.1
>
>
> We need this to effect repair when damage.
> If procedure WALs AND a server WAL dir are lost or cleaned or we crashed 
> during partial split (unlikely scenarios but nonetheless possible), a Master 
> can be stuck unable to become active because there is no assign procedure for 
> hbase:meta in the system.
> The reasonable argument over in HBASE-21035 has it that attempts at 
> auto-repair under these extremes could cause other issues so at least until 
> we learn more, we for now punt to the operator for fix-up.
> To reproduce the catastrophe, see notes in HBASE-21035 (and [~allan163]'s 
> test).
> UPDATE: HBASE-21191 adds a Master assuming an "holding-pattern" if on startup 
> it does not have an assign for meta (possible if we lose all Master WAL 
> Procs.). Holding pattern is needed because we were exiting after one minute 
> of RPC'ing to old meta location. To inject an assign, the Admin#assign won't 
> work because it gets rejected because the "Master is Initializing". So we 
> need to be able to assign hbase:meta even if "Master is initializing". Also, 
> while in here, add being able to bulk assign because assigning a 
> Region-at-a-time from the shell only works if the offflined region count is 
> in the low 10s; fails when thousands offline.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to