Hi,
I'm pleased to finally be able to share this design document with you
all. It's the result of internal review from half a dozen or so from
within our community (Enis, Devaraj, Artem, and Clay easily come to
mind) after multiple months of review and iteration.
Abstract:
<quote>
Infrastructure as a service (IaaS) via public cloud infrastructure
offerings (Cloud Iaas) has grown dramatically in popularity through
services like Amazon EC2, Google Compute Engine, and Microsoft Azure
Compute. Across Apache HBase users, the majority of new system
architectures include some form of Cloud IaaS as a means to increase the
capabilities and/or decrease the cost of operation of their system.
However, deploying HBase on these platforms comes with difficulties as
HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee
the durability of data written to HBase. This document outlines a
proposal to remove HBase’s dependency on HDFS by replacing the current
Write-Ahead-Log (WAL) implementation using Apache Ratis (incubating). It
covers why the HDFS dependency is a problem on Cloud IaaS, how Ratis can
be used to replace HDFS-based WALs, and a high-level development plan to
effectively implement the replacement of this extremely critical HBase
internal component without becoming tied to a single Cloud IaaS offering.
</quote>
The document is available on Google Docs[1] and there is also PDF
available [2] of the current version. I'm happy to assist those who do
not want to use the copy on a Google service (e.g. transcribe
mailing-list chatter onto the Doc).
Thanks to some of the same folks who helped with this document, I also
have a fairly in-depth analysis of what we think the required work will
entail. For the HBase specific changes, I'd like to avoid the pitfall we
commonly face and work towards frequent merges into master that do not
destabilize the build (keep things "Green") to avoid stalling our
forward momentum after 2.0. If people are curious/interested, I'm happy
to delve some more into how I think we can implement this.
- Josh
[1]
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#
[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the%20Cloud.pdf