[
https://issues.apache.org/jira/browse/HBASE-21743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16765480#comment-16765480
]
Sergey Shelukhin edited comment on HBASE-21743 at 2/11/19 10:54 PM:
--------------------------------------------------------------------
Ok, after reading on this a little bit I think the better term I'm looking for
is declarative assignment.
The approach to assignment that is much less error prone (IMO) is to always
operate from "this is the current state" vs "this is the desired state" (which
HBase already has e.g. in heartbeat, but doesn't use like that), as opposed to
imperative approach "do this", "I did this", "ok now do that", given the
distributed nature of the system. It is also more resilient because AM can
always see the state and doesn't depend on sequence of operations, lost
messages, incorrect hbck or manual interventions messing things up or even just
racing with master itself; so it can resolve the situation in most error cases.
It can work with procedures that require multi step operations that can still
be imperative. Assuming only one high-level procedure at a time (e.g. region
cannot be splitting and also merging), existence of an attached procedure to do
something for a region is just a piece of state that declarative assignment can
consider. Alternatively, in per-region processing, master can both move
procedures forward and process state in a single thread (per region; there can
be multiple threads each handling one region at a time if desired). The latter
approach can simplify things because no sync is needed and all the interactions
are visible. Other components like master startup, or load balancer, can issue
commands (e.g. to move a region). The procedures can also issue desired-state
changes (e.g. unassign for split), and also optionally process current state
changes. If there's no procedure, or procedure refuses to react to state
changes, the standard handler can compare desired and actual state and drive
assignment. As long as state (e.g. OPENING) set correctly, which is already a
requirement, this will always get region into correct state eventually
regardless of what's going on. It will also not have as much racing potential
with procedures because procedures will operate on the same notifications on
the same thread, and can override default processing.
In general, in this approach, if it's trivial to save state declaratively (like
in case of assignment, where looking at the cluster it's always possible to
determine what to do), it should be stored as such; if not, then procedures
should be used. I frankly think splits/merges can also be declarative and
mostly procedures are needed for table-wide operations, but I can see how being
a multi-region operations split and merge can also benefit from imperative
approach.
was (Author: sershe):
Ok, after reading on this a little bit I think the better term I'm looking for
is declarative assignment.
The approach to assignment that is much less error prone (IMO) is to always
operate from "this is the current state" vs "this is the desired state" (which
HBase already has e.g. in heartbeat, but doesn't use like that), as opposed to
imperative approach "do this", "I did this", "ok now do that", given the
distributed nature of the system. It is also more resilient because AM can
always see the state and doesn't depend on sequence of operations, lost
messages, incorrect hbck or manual interventions messing things up or even just
racing with master itself; so it can resolve the situation in most error cases.
It can work with procedures that require multi step operations that can still
be imperative. Assuming only one high-level procedure at a time (e.g. region
cannot be splitting and also merging), existence of an attached procedure to do
something for a region is just a piece of state that declarative assignment can
consider. Alternatively, in per-region processing, master can both move
procedures forward and process state in a single thread (per region; there can
be multiple threads each handling one region at a time if desired). The latter
approach can simplify things because no sync is needed and all the interactions
are visible. Other components like master startup, or load balancer, can issue
commands (e.g. to move a region). The procedures can also issue desired-state
changes (e.g. unassign for split), and also optionally process current state
changes. If there's no procedure, or procedure refuses to react to state
changes, the standard handler can compare desired and actual state and drive
assignment. As long as state (e.g. OPENING) set correctly, which is already a
requirement, this will always get region into correct state eventually
regardless of what's going on.
In general, if it's trivial to save state declaratively (like in case of
assignment, where looking at the cluster it's always possible to determine what
to do), it should be stored as such; if not, then procedures should be used.
It will also not have as much racing potential with procedures because
procedures will operate on the same notifications on the same thread, and can
override default processing.
> declarative assignment
> ----------------------
>
> Key: HBASE-21743
> URL: https://issues.apache.org/jira/browse/HBASE-21743
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Priority: Major
>
> Running HBase for only a few weeks we found dozen(s?) of bugs with assignment
> that all seem to have the same nature - split brain between 2 procedures; or
> between procedure and master startup (meta replica bugs); or procedure and
> master shutdown (HBASE-21742); or procedure and something else (when SCP had
> incorrect region list persisted, don't recall the bug#).
> To me, it starts to look like a pattern where, like in AMv1 where concurrent
> interactions were unclear and hard to reason about, despite the cleaner
> individual pieces in AMv2 the problem of unclear concurrent interactions has
> been preserved and in fact increased because of the operation state
> persistence and isolation.
> Procedures are great for multi-step operations that need rollback and stuff
> like that, e.g. creating a table or snapshot, or even region splitting.
> However I'm not so sure about assignment.
> We have the persisted information - region state in meta (incl transition
> states like opening, or closing), server list as WAL directory list.
> Procedure state is not any more reliable then those (we can argue that meta
> update can fail, but so can procv2 WAL flush, so we have to handle cases of
> out of date information regardless). So, we don't need any extra state to
> decide on assignment, whether for recovery and balancing. In fact, as
> mentioned in some bugs, deleting procv2 WAL is often the best way to recover
> the cluster, because master can already figure out what to do without
> additional state.
> I think there should be an option for stateless assignment that does that.
> It can either be as a separate pluggable assignment procedure; or an option
> that will not recover SCP, RITs etc from WAL but always derive recovery
> procedures from the existing cluster state.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)