[ 
https://issues.apache.org/jira/browse/HBASE-23270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076037#comment-17076037
 ] 

Diptam Ghosh commented on HBASE-23270:
--------------------------------------

For this issue, in my view, "*HbaseInterClusterReplication*" is handling too 
many responsibilities , separating out which will give us more flexibility 
before proceeding to the implementation of RSGroup aware  replication. 
 # We can extract the "batch creation", "sink management" and "triggering 
replication of one batch" out of *this* class, leaving it with "init config" 
and "manage shipping logic with failure handling". Tasks mentioned above can be 
abstract out with help of a Processor class. Further Sink Selection can also be 
abstracted out with help of strategy implementation. 
 # Proceeding with this design, we can implement custom sink selector, batching 
logic and will be able to configure from outside as well. 

> Inter-cluster replication is unaware destination peer cluster's RSGroup to 
> push the WALEdits
> --------------------------------------------------------------------------------------------
>
>                 Key: HBASE-23270
>                 URL: https://issues.apache.org/jira/browse/HBASE-23270
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Pradeep
>            Assignee: Pradeep
>            Priority: Major
>
> In a source RSGroup enabled HBase cluster where replication is enabled to 
> another destination RSGroup enabled cluster, the replication stream of 
> List<WALEdit.Entry> go to any node in the destination cluster without the 
> awareness of RSGroup and then gets routed to appropriate node where the 
> region is hosted. This extra hop where the data is received and routed could 
> be of any node in the cluster and no restriction exists to select the node 
> within the same RSGroup.
> Implications: RSGroup owner in the multi-tenant HBase cluster can see 
> performance and throughput deviations because of this unpredictability caused 
> by replication.
> Potential fix: options:
> a) Select a destination node having RSGroup awareness
> b) Group the WAL.Edit list based on region and then by region-servers in 
> which the regions are assigned in the destination. Pass the list WALEdit 
> directly to the region-server to avoid extra intermediate hop in the 
> destination cluster during the replication process. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to