[
https://issues.apache.org/jira/browse/HBASE-24439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Kyle Purtell updated HBASE-24439:
----------------------------------------
Issue Type: Brainstorming (was: Improvement)
> Replication queue recovery tool for rescuing deep queues
> --------------------------------------------------------
>
> Key: HBASE-24439
> URL: https://issues.apache.org/jira/browse/HBASE-24439
> Project: HBase
> Issue Type: Brainstorming
> Components: Replication
> Reporter: Andrew Kyle Purtell
> Priority: Major
>
> In HBase cross site replication, on the source side, every regionserver
> places its WALs into a replication queue and then drains the queue to the
> remote sink cluster. At the source cluster every regionserver participates as
> a source. At the sink cluster, a configurable subset of regionservers
> volunteer to process inbound replication RPC.
> When data is highly skewed we can take certain steps to mitigate, such as
> pre-splitting, or manual splitting, and rebalancing. This can most
> effectively be done at the sink, because replication RPCs are randomly
> distributed over the set of receiving regionservers, and splitting on the
> sink side can effectively redistribute resulting writes there. On the source
> side we are more limited.
> If writes are deeply unbalanced, a regionserver's source replication queue
> may become very deep. Hotspotting can happen, despite mitigations. Unlike on
> the sink side, once hotspotting has happened at the source, it is not
> possible to increase parallelism or redistribute work among sources once WALs
> have already been enqueued. Increasing parallelism on the sink side will not
> help if there is a big rock at the source. Source side mitigations like
> splitting and redistribute cannot help deep queues already accumulated.
> Can we redistribute source work? Yes and no. If a source regionserver fails,
> its queues will be recovered by other regionservers. However the other rs
> must still serve the recovered queue as an atomic entity. We can move a deep
> queue, but we can't break it up.
> Where time is of the essence, and ordering semantics can be allowed to break,
> operators should have available to them a recovery tool that rescues their
> production from the consequences of deep source queues. A very large
> replication queue can be split into many smaller queues. Perhaps even one new
> queue for each WAL file. Then, these new synthetic queues can be distributed
> to any/all source regionservers through the normal recovery queue assignment
> protocol. This increases parallelism at the source.
> Of course this would break serial replication semantics, and sync replication
> semantics, and even in branch-1 which does not have these features would
> highly increase the probability of reordering of edits. That is an
> unavoidable consequence of breaking up the queue for more parallelism, but as
> long as this is done by a separate tool, invoked by operators, it is a valid
> option for emergency drain if backed up replication queues. Every cell in the
> WAL entries carries a timestamp assigned at the source, and will be applied
> on the sink with this timestamp. When the queue is drained and all edits have
> been persisted at the target, there will be a complete and correct temporal
> data ordering at that time. An operator will be and must be prepared to
> handle intermediate mis-/re-ordered states if they intend to invoke this
> tool. In many use cases the interim states are not important. The final state
> after all edits have transferred cross cluster and persisted at this sink,
> after invocation of the recovery tool, is the point where the operator would
> transition back into service.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)