[jira] [Commented] (HBASE-29081) Add HBase Read Replica Cluster feature

Dong0829 (Jira) Mon, 07 Jul 2025 19:31:42 -0700


    [ 
https://issues.apache.org/jira/browse/HBASE-29081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18003649#comment-18003649
 ]


Dong0829 commented on HBASE-29081:
----------------------------------

[~andor] Thanks a lot for your update. Good to know this context. Regarding the 
refresh_meta, just for your information, regarding *refresh_meta* and 
{*}refresh_hfiles{*}, its not missing, just not complete, for the refresh_meta, 
the previous ticket already finished the backend logic with a patch 
https://issues.apache.org/jira/browse/HBASE-18840, but not provide user 
interface yet, for the *refresh_hfiles* , its already merged (but not provide 
cli interface) https://issues.apache.org/jira/browse/HBASE-18448

 

Regarding the current project, I saw you guys already finished *refresh_meta* 
with procedure way, I think its better than original version. For 
refresh_files, I think you can consider to reuse  RefreshHFilesEndpoint and 
RefreshHFilesClient

> Add HBase Read Replica Cluster feature
> --------------------------------------
>
>                 Key: HBASE-29081
>                 URL: https://issues.apache.org/jira/browse/HBASE-29081
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Replication
>            Reporter: Andor Molnar
>            Assignee: Andor Molnar
>            Priority: Major
>
> h1. Objective
> We’d like to implement the open source version of Amazon’s [Read Replica 
> Cluster on 
> S3|https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/]
>   feature for Apache HBase. It adds the ability of running another HBase 
> cluster on the same cloud storage location in read-only mode, allowing users 
> to share the read workload between multiple clusters. Due to the 
> characteristics of the implementation and the lack of automated 
> synchronization between the active and read-replica clusters, read replicas 
> are eventually consistent, hence they’re not suitable for reading most recent 
> data. However we still believe that users of open source Apache HBase could 
> take advantage of this feature and there’re use cases out there which read 
> replicas could help with. Please find more information about the feature in 
> the linked blog post.
> h1. Pros
>  * Running multiple clusters in different Availability Zones adds HA to the 
> entire workload,
>  * No need for data movement or duplication (active-active replication setup) 
> which is cost and time efficient,
>  * No limit for the number of read replica clusters
> h1. Cons
>  * Read Replica clusters are eventually consistent: in memory data is not 
> visible from read replicas,
>  * Read Replica clusters must be manually refreshed: flush on active cluster, 
> refresh hfiles/meta on read replicas



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (HBASE-29081) Add HBase Read Replica Cluster feature

Reply via email to