[
https://issues.apache.org/jira/browse/HBASE-24439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell resolved HBASE-24439.
-----------------------------------------
Assignee: (was: Sandeep Pal)
Resolution: Won't Fix
> Replication queue recovery tool for rescuing deep queues
> --------------------------------------------------------
>
> Key: HBASE-24439
> URL: https://issues.apache.org/jira/browse/HBASE-24439
> Project: HBase
> Issue Type: Brainstorming
> Components: Replication
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> In HBase cross site replication, on the source side, every regionserver
> places its WALs into a replication queue and then drains the queue to the
> remote sink cluster. At the source cluster every regionserver participates as
> a source. At the sink cluster, a configurable subset of regionservers
> volunteer to process inbound replication RPC.
> When data is highly skewed we can take certain steps to mitigate, such as
> pre-splitting, or manual splitting, and rebalancing. This can most
> effectively be done at the sink, because replication RPCs are randomly
> distributed over the set of receiving regionservers, and splitting on the
> sink side can effectively redistribute resulting writes there. On the source
> side we are more limited.
> If writes are deeply unbalanced, a regionserver's source replication queue
> may become very deep. Hotspotting can happen, despite mitigations. Unlike on
> the sink side, once hotspotting has happened at the source, it is not
> possible to increase parallelism or redistribute work among sources once WALs
> have already been enqueued. Increasing parallelism on the sink side will not
> help if there is a big rock at the source. Source side mitigations like
> splitting and region redistribution cannot help deep queues already
> accumulated.
> Can we redistribute source work? Yes and no. If a source regionserver fails,
> its queues will be recovered by other regionservers. However the other rs
> must still serve the recovered queue as an atomic entity. We can move a deep
> queue, but we can't break it up.
> Where time is of the essence, and ordering semantics can be allowed to break,
> operators should have available to them a recovery tool that rescues their
> production from the consequences of deep source queues. A very large
> replication queue can be split into many smaller queues. Perhaps even one new
> queue for each WAL file. Then, these new synthetic queues can be distributed
> to any/all source regionservers through the normal recovery queue assignment
> protocol. This increases parallelism at the source.
> Of course this would break serial replication semantics and even in branch-1
> which does not have that feature it would signficantly increase the
> probability of reordering of edits. That is an unavoidable consequence of
> breaking up the queue for more parallelism. As long as this is done by a
> separate tool, invoked by operators, it is a valid option for emergency
> drain, and once the drain is complete, the final state will be properly
> ordered. Every cell in the WAL entries carries a timestamp assigned at the
> source, and will be applied on the sink with this timestamp. When the queue
> is drained and all edits have been persisted at the target, there will be a
> complete and correct temporal data ordering at that time. An operator will be
> and must be prepared to handle intermediate mis-/re-ordered states if they
> intend to invoke this tool. In many use cases the interim states are not
> important. The final state after all edits have transferred cross cluster and
> persisted at this sink, after invocation of the recovery tool, is the point
> where the operator would transition back into service.
> As a strawman we can propose these work items:
> - Add a replication admin command that can reassign a replication queue away
> from an active source. The active source makes a new queue and continues. The
> previously active queue can be assigned to another regionserver as a recovery
> queue or can be left unassigned (e.g. target = null)
> - Administratively unassigned recovery queues should not be automatically
> processed, but must be discoverable.
> - Add a replication admin command that transitions an unassigned replication
> queue into an active and eligible recovery queue.
> - Create a tool that uses these new APIs to take control of a (presumably
> deep) replication queue, breaks up the queue into its constituent WAL files,
> creates new synthetic queues according to a configurable and parameterized
> grouping function, and uses the new APIs to make the new synthetic queues
> eligible for recovery. The original queue retains one group as defined by the
> grouping policy and itself is made re-eligible for recovery.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)