Thanks for the read, Lars!

That's a good question on the sequenceid part. Like my reply to Chia-Ping, I don't think I have a good answer at this point.

I would assume that there could/should be common sequenceid logic across WAL implementations, but I'm not sure if it's better done as "helper code" or as some "service" in each RegionServer. That's something we'll need to keep in mind for certain :)

On 5/4/18 5:54 AM, Lars Francke wrote:
Josh,

thanks to you (and all the others working on this). I did read it once and
I think it sounds very sane. It answers questions that I face more and more
from customers. I have not looked at Ratis in detail so I can't comment on
the challenge of adopting it but I agree with the comments on avoiding the
complexity of requiring Kafka/DistributedLog. A nice and clean API would
give us the opportunity to leverage other services more easily in the
future as well.

This is a minor detail and I'm no expert here (and definitely haven't
thought through all ramifications) but do you still plan on having the WAL
hand out sequenceids or shall that be moved out of that implementation as
well?

Cheers,
Lars

On Thu, May 3, 2018 at 6:04 PM, Josh Elser <els...@apache.org> wrote:

Hi,

I'm pleased to finally be able to share this design document with you all.
It's the result of internal review from half a dozen or so from within our
community (Enis, Devaraj, Artem, and Clay easily come to mind) after
multiple months of review and iteration.

Abstract:

<quote>
Infrastructure as a service (IaaS) via public cloud infrastructure
offerings (Cloud Iaas) has grown dramatically in popularity through
services like Amazon EC2, Google Compute Engine, and Microsoft Azure
Compute. Across Apache HBase users, the majority of new system
architectures include some form of Cloud IaaS as a means to increase the
capabilities and/or decrease the cost of operation of their system.
However, deploying HBase on these platforms comes with difficulties as
HBase has a non-optional dependency on Apache Hadoop HDFS to guarantee the
durability of data written to HBase. This document outlines a proposal to
remove HBase’s dependency on HDFS by replacing the current Write-Ahead-Log
(WAL) implementation using Apache Ratis (incubating). It covers why the
HDFS dependency is a problem on Cloud IaaS, how Ratis can be used to
replace HDFS-based WALs, and a high-level development plan to effectively
implement the replacement of this extremely critical HBase internal
component without becoming tied to a single Cloud IaaS offering.
</quote>

The document is available on Google Docs[1] and there is also PDF
available [2] of the current version. I'm happy to assist those who do not
want to use the copy on a Google service (e.g. transcribe mailing-list
chatter onto the Doc).

Thanks to some of the same folks who helped with this document, I also
have a fairly in-depth analysis of what we think the required work will
entail. For the HBase specific changes, I'd like to avoid the pitfall we
commonly face and work towards frequent merges into master that do not
destabilize the build (keep things "Green") to avoid stalling our forward
momentum after 2.0. If people are curious/interested, I'm happy to delve
some more into how I think we can implement this.

- Josh

[1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit#
[2] https://home.apache.org/~elserj/Effective%20HBase%20in%20the
%20Cloud.pdf


Reply via email to