[ 
https://issues.apache.org/jira/browse/HBASE-29081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andor Molnar updated HBASE-29081:
---------------------------------
    Description: 
h1. Objective

We’d like to implement the open source version of Amazon’s [Read Replica 
Cluster on 
S3|https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/]
  feature for Apache HBase. It adds the ability of running another HBase 
cluster on the same cloud storage location in read-only mode, allowing users to 
share the read workload between multiple clusters. Due to the characteristics 
of the implementation and the lack of automated synchronization between the 
active and read-replica clusters, read replicas are eventually consistent, 
hence they’re not suitable for reading most recent data. However we still 
believe that users of open source Apache HBase could take advantage of this 
feature and there’re use cases out there which read replicas could help with. 
Please find more information about the feature in the linked blog post.
h1. Pros
 * Running multiple clusters in different Availability Zones adds HA to the 
entire workload,
 * No need for data movement or duplication (active-active replication setup) 
which is cost and time efficient,
 * No limit for the number of read replica clusters

h1. Cons
 * Read Replica clusters are eventually consistent: in memory data is not 
visible from read replicas,
 * Read Replica clusters must be manually refreshed: flush on active cluster, 
refresh hfiles/meta on read replicas

  was:
h1. Objective

We’d like to implement the open source version of Amazon’s[ Read Replica 
Cluster on 
S3|https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/]
  feature for Apache HBase. It adds the ability of running another HBase 
cluster on the same cloud storage location in read-only mode, allowing users to 
share the read workload between multiple clusters. Due to the characteristics 
of the implementation and the lack of automated synchronization between the 
active and read-replica clusters, read replicas are eventually consistent, 
hence they’re not suitable for reading most recent data. However we still 
believe that users of open source Apache HBase could take advantage of this 
feature and there’re use cases out there which read replicas could help with. 
Please find more information about the feature in the linked blog post.
h1. Pros
 * Running multiple clusters in different Availability Zones adds HA to the 
entire workload,
 * No need for data movement or duplication (active-active replication setup) 
which is cost and time efficient,
 * No limit for the number of read replica clusters

h1. Cons
 * Read Replica clusters are eventually consistent: in memory data is not 
visible from read replicas,
 * Read Replica clusters must be manually refreshed: flush on active cluster, 
refresh hfiles/meta on read replicas


> Add HBase Read Replica Cluster feature
> --------------------------------------
>
>                 Key: HBASE-29081
>                 URL: https://issues.apache.org/jira/browse/HBASE-29081
>             Project: HBase
>          Issue Type: Umbrella
>          Components: Replication
>            Reporter: Andor Molnar
>            Priority: Major
>
> h1. Objective
> We’d like to implement the open source version of Amazon’s [Read Replica 
> Cluster on 
> S3|https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/]
>   feature for Apache HBase. It adds the ability of running another HBase 
> cluster on the same cloud storage location in read-only mode, allowing users 
> to share the read workload between multiple clusters. Due to the 
> characteristics of the implementation and the lack of automated 
> synchronization between the active and read-replica clusters, read replicas 
> are eventually consistent, hence they’re not suitable for reading most recent 
> data. However we still believe that users of open source Apache HBase could 
> take advantage of this feature and there’re use cases out there which read 
> replicas could help with. Please find more information about the feature in 
> the linked blog post.
> h1. Pros
>  * Running multiple clusters in different Availability Zones adds HA to the 
> entire workload,
>  * No need for data movement or duplication (active-active replication setup) 
> which is cost and time efficient,
>  * No limit for the number of read replica clusters
> h1. Cons
>  * Read Replica clusters are eventually consistent: in memory data is not 
> visible from read replicas,
>  * Read Replica clusters must be manually refreshed: flush on active cluster, 
> refresh hfiles/meta on read replicas



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to