[ 
https://issues.apache.org/jira/browse/HBASE-24439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Kyle Purtell updated HBASE-24439:
----------------------------------------
    Issue Type: Brainstorming  (was: Improvement)

> Replication queue recovery tool for rescuing deep queues
> --------------------------------------------------------
>
>                 Key: HBASE-24439
>                 URL: https://issues.apache.org/jira/browse/HBASE-24439
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Replication
>            Reporter: Andrew Kyle Purtell
>            Priority: Major
>
> In HBase cross site replication, on the source side, every regionserver 
> places its WALs into a replication queue and then drains the queue to the 
> remote sink cluster. At the source cluster every regionserver participates as 
> a source. At the sink cluster, a configurable subset of regionservers 
> volunteer to process inbound replication RPC. 
> When data is highly skewed we can take certain steps to mitigate, such as 
> pre-splitting, or manual splitting, and rebalancing. This can most 
> effectively be done at the sink, because replication RPCs are randomly 
> distributed over the set of receiving regionservers, and splitting on the 
> sink side can effectively redistribute resulting writes there. On the source 
> side we are more limited. 
> If writes are deeply unbalanced, a regionserver's source replication queue 
> may become very deep. Hotspotting can happen, despite mitigations. Unlike on 
> the sink side, once hotspotting has happened at the source, it is not 
> possible to increase parallelism or redistribute work among sources once WALs 
> have already been enqueued. Increasing parallelism on the sink side will not 
> help if there is a big rock at the source. Source side mitigations like 
> splitting and redistribute cannot help deep queues already accumulated.
> Can we redistribute source work? Yes and no. If a source regionserver fails, 
> its queues will be recovered by other regionservers. However the other rs 
> must still serve the recovered queue as an atomic entity. We can move a deep 
> queue, but we can't break it up. 
> Where time is of the essence, and ordering semantics can be allowed to break, 
> operators should have available to them a recovery tool that rescues their 
> production from the consequences of  deep source queues. A very large 
> replication queue can be split into many smaller queues. Perhaps even one new 
> queue for each WAL file. Then, these new synthetic queues can be distributed 
> to any/all source regionservers through the normal recovery queue assignment 
> protocol. This increases parallelism at the source.
> Of course this would break serial replication semantics, and sync replication 
> semantics, and even in branch-1 which does not have these features would 
> highly increase the probability of reordering of edits. That is an 
> unavoidable consequence of breaking up the queue for more parallelism, but as 
> long as this is done by a separate tool, invoked by operators, it is a valid 
> option for emergency drain if backed up replication queues. Every cell in the 
> WAL entries carries a timestamp assigned at the source, and will be applied 
> on the sink with this timestamp. When the queue is drained and all edits have 
> been persisted at the target, there will be a complete and correct temporal 
> data ordering at that time. An operator will be and must be prepared to 
> handle intermediate mis-/re-ordered states if they intend to invoke this 
> tool. In many use cases the interim states are not important. The final state 
> after all edits have transferred cross cluster and persisted at this sink, 
> after invocation of the recovery tool, is the point where the operator would 
> transition back into service.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to