+1 to starting the work. I think most of the concerns can be figured out on the JIRAs and we can have a project update every X weeks if enough people are interested.
I also agree to frame the feature correctly. Decoupling from a HDFS WAL or WAL on Ratis would be more appropriate names that would better convey the scope. I think there are a number of projects necessary to complete "HBase on Cloud" with this being one of those. Thanks for driving this initiative! Zach On Wed, Jul 25, 2018 at 11:55 AM, Josh Elser <[email protected]> wrote: > Let me give an update on-list for everyone: > > First and foremost, thank you very much to everyone who took the time to > read this, with an extra thanks to those who participated in discussion. > There were lots of great points raised. Some about things that were unclear > in the doc, and others shining light onto subjects I hadn't considered yet. > > My biggest take-away is that I complicated this document by tying it too > closely with "HBase on Cloud", treating the WAL+Ratis LogService as the > only/biggest thing to figure out. This was inaccurate and overly bold of > me: I apologize. I think this complicated discussion on a number of points, > and ate a good bit of some of your's time. > > My goal was to present this as an important part of a transition to the > "cloud", giving justification to what WAL+Ratis helps HBase achieve. I did > not want this document to be a step-by-step guide to a perfect HBase on > Cloud design. I need to do a better job with this in the future; sorry. > > That said, my feeling is that, on the whole, folks are in support of the > proposed changes/architecture described for the WAL+Ratis work (tl;dr > revisit WAL API, plug in current WAL implementation to any API > modification, build new Ratis-backed WAL impl). There were some concerns > which still need immediate action that I am aware of: > > * Sync with Ram and Anoop re: in-memory WAL [1] > * Where is Ratis LogService metadata kept? How do we know what LogStreams > were being used/maintained by a RS? How does this tie into recovery? > > There are also long-term concerns which I don't think I have an answer for > yet (for either reasons out of my control or a lack of technical > understanding): > > * Maturity of the Ratis community > * Required performance by HBase and the ability of the LogService to > provide that perf (Areas already mentioned: gRPC perf, fsyncs bogging down > disks, ability to scale RAFT quorums). > * Continue with WAL-per-RS or move to WAL-per-Region? Related to perf, > dependent upon Ratis scalability. > * I/O amplification on WAL retention for backup&restore and replication > ("logstream export") > * Ensure that LogStreams can be exported to a dist-filesystem in a manner > which requires no additional metadata/handling (avoid more storage/mgmt > complexity) > * Ability to build krb5 authn into Ratis (really, gRPC) > > I will continue the two immediate action items. I think the latter > concerns are some that will require fingers-on-keyboard -- I don't know > enough about runtime characteristics without seeing it for myself. > > All this said, I'd like to start moving toward the point where we start > breaking out this work into a feature-branch off of master and start > building code. My hope is that this is amenable to everyone, with the > acknowledge that the Ratis work is considered "experimental" and not an > attempt to make all of HBase use Ratis-backed WALs. > > Finally, I do *not* want this message to be interpreted as me squashing > anyone's concerns. My honest opinion is that discussion has died down, but > I will be the first to apologize if I have missed any outstanding concerns. > Please, please, please ping me if I am negligent. > > Thanks once again for everyone's participation. > > [1] https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb > SJwBHVxbO7ge5ORqbCk/edit?disco=AAAACBm3RLM > > On 2018/07/13 20:15:45, Josh Elser <[email protected]> wrote: > Hi all, > >> >> A long time ago, I shared a document about a (I'll call it..) "vision" >> where we make some steps towards decoupling HBase from HDFS in an effort to >> make deploying HBase on Cloud IaaS providers a bit easier (operational >> simplicity, effective use of common IaaS paradigms, etc). >> >> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb >> SJwBHVxbO7ge5ORqbCk/edit?usp=sharing >> >> A good ask from our Stack back then was: "[can you break down this >> work]?" The original document was very high-level, and asking for some more >> details make a lot of sense. Months later, I'd like to share that I've >> updated the original document with some new content at the bottom (as well >> as addressed some comments which went unanswered by me -- sorry!) >> >> Based on a discussion I had earlier this week (and some discussions >> during HBaseCon in California in June), I've tried to add a brief >> "refresher" on what some of the big goals for this effort are. Please check >> it out at your leisure and let me know what you think. Would like to start >> getting some fingers behind this all and pump out some code :) >> >> https://docs.google.com/document/d/1Su5py_T5Ytfh9RoTTX2s20Kb >> SJwBHVxbO7ge5ORqbCk/edit#bookmark=id.fml9ynrqagk >> >> - Josh >> >>
