[
https://issues.apache.org/jira/browse/HBASE-26950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17521341#comment-17521341
]
Bryan Beaudreault commented on HBASE-26950:
-------------------------------------------
Not planning on working on this right this moment, but I took a quick look.
It's straightforward to convert ReplicationSink itself, just {{.join()}} on the
futures where necessary. The complication is in replicated bulk loads, where
there are lots of dependencies on blocking Table and Connection in
LoadIncrementalHFiles. The easiest thing might be to maintain 2 separate
connections, async for batch calls and sync for hfiles.
> Use AsyncConnection in ReplicationSink
> --------------------------------------
>
> Key: HBASE-26950
> URL: https://issues.apache.org/jira/browse/HBASE-26950
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 2.4.11
> Reporter: Bryan Beaudreault
> Priority: Major
>
> We don't need to necessarily rewrite ReplicationSink to work fully async. I
> think it would simply benefit from ConnectionFactory.createAsyncConnection
> instead of ConnectionFactory.createConnection.
> The reasons for this are:
> * AsyncConnection is the more modern implementation, the only implementation
> in master, and where most of the efforts will be going forward.
> * ReplicationSink only does batch calls, and batch calls are done with
> AsyncProcess. It's likely that the native AsyncTable is better than
> AsyncProcess for this.
> ** One specific example, AsyncProcess calls findAllLocationsOrFail
> sequentially for all actions in a batch. This can take quite a while with the
> default replication batch size of 5k, if actions are spread across many
> regions. In AsyncTable, these calls are done in parallel
--
This message was sent by Atlassian Jira
(v8.20.1#820001)