Hello everyone, We've discussing an idea internally at Cloudera about implementing the open source version of [1] Amazon's EMR Read Replica Clusters on Amazon S3. A feature which would let us run HBase in read-only mode and having another cluster running on the same storage location, will be potentially beneficial to our customers. Main advantages are optimizing the cost of object storage in the cloud and sharing the read workload between multiple clusters.
In the Open Source area we also have room for improvement: support other object store providers, automate processes which are currently manual, and so on. We’d greatly appreciate your feedback in the document: whether it’s about the viability of the idea, areas for improvement, or suggestions to simplify the approach. [1] https://aws.amazon.com/blogs/big-data/setting-up-read-replica-clusters-with-hbase-on-amazon-s3/ [2] https://docs.google.com/document/d/1EI0lsURX1BZhv3DYgMvZCl4EUy-ADJRkHUc1PjzZtj0/edit?usp=sharing Regards, Andor