[jira] [Comment Edited] (HBASE-21743) stateless assignment

Sergey Shelukhin (JIRA) Mon, 28 Jan 2019 10:52:50 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16754246#comment-16754246
 ]


Sergey Shelukhin edited comment on HBASE-21743 at 1/28/19 6:51 PM:
-------------------------------------------------------------------

Ok, I have less time for that now due to having to debug all the issues ;)
However split and merge are not covered by my "smaller" proposal, where master 
(if configured) will ignore only recovery-related procedures.
During failure, master should already be able to handle not persisting the 
state of some procedure (because by definition cluster is much more likely to 
be in a bad state), so it should also be able to abandon old recovery 
procedures (SCP & RIT and their children) as if they were not saved, and create 
new ones during startup.
 I will keep this JIRA for the larger feature (and later move the discussion to 
dev@ when there's more time :)), and file a separate JIRA ( HBASE-21797) for 
the recovery part... 


was (Author: sershe):
Ok, I have less time for that now due to having to debug all the issues ;)
However split and merge are not covered by my "smaller" proposal, where master 
(if configured) will ignore only recovery-related procedures.
During failure, master should already be able to handle not persisting the 
state of some procedure (because by definition cluster is much more likely to 
be in a bad state), so it should also be able to abandon old recovery 
procedures (SCP & RIT and their children) as if they were not saved, and create 
new ones during startup.
 I will keep this JIRA for the larger feature (and later move the discussion to 
dev@ when there's more time :)), and file a separate JIRA for the recovery 
part... 

> stateless assignment
> --------------------
>
>                 Key: HBASE-21743
>                 URL: https://issues.apache.org/jira/browse/HBASE-21743
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Major
>
> Running HBase for only a few weeks we found dozen(s?) of bugs with assignment 
> that all seem to have the same nature - split brain between 2 procedures; or 
> between procedure and master startup (meta replica bugs); or procedure and 
> master shutdown (HBASE-21742); or procedure and something else (when SCP had 
> incorrect region list persisted, don't recall the bug#). 
> To me, it starts to look like a pattern where, like in AMv1 where concurrent 
> interactions were unclear and hard to reason about, despite the cleaner 
> individual pieces in AMv2 the problem of unclear concurrent interactions has 
> been preserved and in fact increased because of the operation state 
> persistence and  isolation.
> Procedures are great for multi-step operations that need rollback and stuff 
> like that, e.g. creating a table or snapshot, or even region splitting. 
> However I'm not so sure about assignment. 
> We have the persisted information - region state in meta (incl transition 
> states like opening, or closing), server list as WAL directory list. 
> Procedure state is not any more reliable then those (we can argue that meta 
> update can fail, but so can procv2 WAL flush, so we have to handle cases of 
> out of date information regardless). So, we don't need any extra state to 
> decide on assignment, whether for recovery and balancing. In fact, as 
> mentioned in some bugs, deleting procv2 WAL is often the best way to recover 
> the cluster, because master can already figure out what to do without 
> additional state.
> I think there should be an option for stateless assignment that does that.
> It can either be as a separate pluggable assignment procedure; or an option 
> that will not recover SCP, RITs etc from WAL but always derive recovery 
> procedures from the existing cluster state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-21743) stateless assignment

Reply via email to