Hi,

I'm pleased to finally be able to share this design document with you all. It's the result of internal review from half a dozen or so from within our community (Enis, Devaraj, Artem, and Clay easily come to mind) after multiple months of review and iteration.

Abstract:

<quote>
Infrastructure as a service (IaaS) via public cloud infrastructure offerings (Cloud Iaas) has grown dramatically in popularity through services like Amazon EC2, Google Compute Engine, and Microsoft Azure Compute. Across Apache HBase users, the majority of new system architectures include some form of Cloud IaaS as a means to increase the capabilities and/or decrease the cost of operation of their system. However, deploying HBase on these platforms comes with difficulties as HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee the durability of data written to HBase. This document outlines a proposal to remove HBase’s dependency on HDFS by replacing the current Write-Ahead-Log (WAL) implementation using Apache Ratis (incubating). It covers why the HDFS dependency is a problem on Cloud IaaS, how Ratis can be used to replace HDFS-based WALs, and a high-level development plan to effectively implement the replacement of this extremely critical HBase internal component without becoming tied to a single Cloud IaaS offering.
</quote>

The document is available on Google Docs[1] and there is also PDF available [2] of the current version. I'm happy to assist those who do not want to use the copy on a Google service (e.g. transcribe mailing-list chatter onto the Doc).

Thanks to some of the same folks who helped with this document, I also have a fairly in-depth analysis of what we think the required work will entail. For the HBase specific changes, I'd like to avoid the pitfall we commonly face and work towards frequent merges into master that do not destabilize the build (keep things "Green") to avoid stalling our forward momentum after 2.0. If people are curious/interested, I'm happy to delve some more into how I think we can implement this.

- Josh

[1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#
[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the%20Cloud.pdf

Reply via email to