Thanks, Zach!
I like your suggestion about project updates. I sincerely hope that this
can be something transparent enough that folks who want to follow-on and
participate in implementation can do so. Let me think about how to drive
this better.
On 7/25/18 3:55 PM, Zach York wrote:
+1 to starting the work. I think most of the concerns can be figured out on
the JIRAs and we can have a project update every X weeks if enough people
are interested.
I also agree to frame the feature correctly. Decoupling from a HDFS WAL or
WAL on Ratis would be more appropriate names that would better convey the
scope. I think there are a number of projects necessary to complete "HBase
on Cloud" with this being one of those.
Thanks for driving this initiative!
Zach
On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser <[email protected]> wrote:
Let me give an update on-list for everyone:
First and foremost, thank you very much to everyone who took the time to
read this, with an extra thanks to those who participated in discussion.
There were lots of great points raised. Some about things that were unclear
in the doc, and others shining light onto subjects I hadn't considered yet.
My biggest take-away is that I complicated this document by tying it too
closely with "HBase on Cloud", treating the WAL+Ratis LogService as the
only/biggest thing to figure out. This was inaccurate and overly bold of
me: I apologize. I think this complicated discussion on a number of points,
and ate a good bit of some of your's time.
My goal was to present this as an important part of a transition to the
"cloud", giving justification to what WAL+Ratis helps HBase achieve. I did
not want this document to be a step-by-step guide to a perfect HBase on
Cloud design. I need to do a better job with this in the future; sorry.
That said, my feeling is that, on the whole, folks are in support of the
proposed changes/architecture described for the WAL+Ratis work (tl;dr
revisit WAL API, plug in current WAL implementation to any API
modification, build new Ratis-backed WAL impl). There were some concerns
which still need immediate action that I am aware of:
* Sync with Ram and Anoop re: in-memory WAL [1]
* Where is Ratis LogService metadata kept? How do we know what LogStreams
were being used/maintained by a RS? How does this tie into recovery?
There are also long-term concerns which I don't think I have an answer for
yet (for either reasons out of my control or a lack of technical
understanding):
* Maturity of the Ratis community
* Required performance by HBase and the ability of the LogService to
provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down
disks, ability to scale RAFT quorums).
* Continue with WAL-per-RS or move to WAL-per-Region? Related to perf,
dependent upon Ratis scalability.
* I/O amplification on WAL retention for backup&restore and replication
("logstream export")
* Ensure that LogStreams can be exported to a dist-filesystem in a manner
which requires no additional metadata/handling (avoid more storage/mgmt
complexity)
* Ability to build krb5 authn into Ratis (really, gRPC)
I will continue the two immediate action items. I think the latter
concerns are some that will require fingers-on-keyboard -- I don't know
enough about runtime characteristics without seeing it for myself.
All this said, I'd like to start moving toward the point where we start
breaking out this work into a feature-branch off of master and start
building code. My hope is that this is amenable to everyone, with the
acknowledge that the Ratis work is considered "experimental" and not an
attempt to make all of HBase use Ratis-backed WALs.
Finally, I do *not* want this message to be interpreted as me squashing
anyone's concerns. My honest opinion is that discussion has died down, but
I will be the first to apologize if I have missed any outstanding concerns.
Please, please, please ping me if I am negligent.
Thanks once again for everyone's participation.
[1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit?disco=AAAACBm3RLM
On 2018/07/13 20:15:45, Josh Elser <[email protected]> wrote: > Hi all,
A long time ago, I shared a document about a (I'll call it..) "vision"
where we make some steps towards decoupling HBase from HDFS in an effort to
make deploying HBase on Cloud IaaS providers a bit easier (operational
simplicity, effective use of common IaaS paradigms, etc).
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit?usp=sharing
A good ask from our Stack back then was: "[can you break down this
work]?" The original document was very high-level, and asking for some more
details make a lot of sense. Months later, I'd like to share that I've
updated the original document with some new content at the bottom (as well
as addressed some comments which went unanswered by me -- sorry!)
Based on a discussion I had earlier this week (and some discussions
during HBaseCon in California in June), I've tried to add a brief
"refresher" on what some of the big goals for this effort are. Please check
it out at your leisure and let me know what you think. Would like to start
getting some fingers behind this all and pump out some code :)
https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb
SJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk
- Josh