Let me give an update on-list for everyone:
First and foremost, thank you very much to everyone who took the time to
read this, with an extra thanks to those who participated in discussion.
There were lots of great points raised. Some about things that were
unclear in the doc, and others shining light onto subjects I hadn't
considered yet.
My biggest take-away is that I complicated this document by tying it too
closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
only/biggest thing to figure out. This was inaccurate and overly bold of
me: I apologize. I think this complicated discussion on a number of
points, and ate a good bit of some of your's time.
My goal was to present this as an important part of a transition to the
"cloud", giving justification to what WAL+Ratis helps HBase achieve. I
did not want this document to be a step-by-step guide to a perfect HBase
on Cloud design. I need to do a better job with this in the future; sorry.
That said, my feeling is that, on the whole, folks are in support of the
proposed changes/architecture described for the WAL+Ratis work (tl;dr
revisit WAL API, plug in current WAL implementation to any API
modification, build new Ratis-backed WAL impl). There were some concerns
which still need immediate action that I am aware of:
* Sync with Ram and Anoop re: in-memory WAL [1]
* Where is Ratis LogService metadata kept? How do we know what
LogStreams were being used/maintained by a RS? How does this tie into
recovery?
There are also long-term concerns which I don't think I have an answer
for yet (for either reasons out of my control or a lack of technical
understanding):
* Maturity of the Ratis community
* Required performance by HBase and the ability of the LogService to
provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging
down disks, ability to scale RAFT quorums).
* Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
dependent upon Ratis scalability.
* I/O amplification on WAL retention for backup&restore and replication
("logstream export")
* Ensure that LogStreams can be exported to a dist-filesystem in a
manner which requires no additional metadata/handling (avoid more
storage/mgmt complexity)
* Ability to build krb5 authn into Ratis (really, gRPC)
I will continue the two immediate action items. I think the latter
concerns are some that will require fingers-on-keyboard -- I don't know
enough about runtime characteristics without seeing it for myself.
All this said, I'd like to start moving toward the point where we start
breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.
Finally, I do *not* want this message to be interpreted as me squashing
anyone's concerns. My honest opinion is that discussion has died down,
but I will be the first to apologize if I have missed any outstanding
concerns. Please, please, please ping me if I am negligent.
Thanks once again for everyone's participation.
[1]
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?disco=AAAACBm3RLM
On 2018/07/13 20:15:45, Josh Elser <[email protected]> wrote: > Hi all,
A long time ago, I shared a document about a (I'll call it..) "vision"
where we make some steps towards decoupling HBase from HDFS in an effort
to make deploying HBase on Cloud IaaS providers a bit easier
(operational simplicity, effective use of common IaaS paradigms, etc).
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit?usp=sharing
A good ask from our Stack back then was: "[can you break down this
work]?" The original document was very high-level, and asking for some
more details make a lot of sense. Months later, I'd like to share that
I've updated the original document with some new content at the bottom
(as well as addressed some comments which went unanswered by me --
sorry!)
Based on a discussion I had earlier this week (and some discussions
during HBaseCon in California in June), I've tried to add a brief
"refresher" on what some of the big goals for this effort are. Please
check it out at your leisure and let me know what you think. Would like
to start getting some fingers behind this all and pump out some code :)
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20KbSJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk
- Josh