[ https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746879#comment-15746879 ]
Joep Rottinghuis commented on HBASE-17018: ------------------------------------------ Interesting idea about using attributes on the mutation itself. That wouldn't mess with the way the BufferedMutatorImpl would deal with them? I don't want those attributes to be transmitted to HBase, that would be a waste. I'll read up more, but upon first inspection I should be able to stash flushCount, submit- and completion times in a byte[]. Putting a flushLatch there would be harder. I'd have to think if that can be stashed and communicated in a different way. The other impact is that I'd have to peel apart a list of mutations and set attributes on each. Right now a submission maintains a List of Mutations so that they can be added to BufferedMutatorImpl in batch. bq. This will not be committed to hbase? It'll be part of timeline v2? Is that a question or statement? ;) If the HBase community is interested to have this part of HBase, that would be great and I'll continue the code in place. If not, then I'll move this to yarn. bq. The bulk of the change in BufferedMutatorParams is unrelated. You want to do a patch w/ just the changes to hbase core removing the non-changes: i.e. in BufferedMutatorParams only change should be the clone method addition, not the reformatting of javadoc. Reformatting javadoc wasn't intended. I'll remove that. In fact, I'll file a separate sub-jira to just add the clone method to the BufferedMutatorParams to separate out that concern. bq. Is FilesystemMutationSpooler still TODO? Is it needed? There doesn't seem to be much filesystem-ey about FilesystemMutationSpooler, at least just yet. Indeed this is still completely an empty template. Actual implementation still open. I didn't want to go too far with implementation, just sketch out enough to make the design clear to get feedback on that. I started with the diagram and a description, but as I was thinking through more details the design had to be tweaked. I figured POC code would do best job in describing the design. With respect to public boolean shouldSpool() indeed, the code right now it a bit more verbose than needed. I'll collapse to the simple format if indeed I don't need to keep track of the max successful flushCount. I need to add actual tests before I can get those details added. bq. If SpoolingBufferedMutatorCoordinator Interface a bit over the top? Is there ever going to be another type of cooridinator implementation? Yes indeed and no probably not. I started with this thinking I needed to make it pluggable for testing. But you're right that no interface is needed there, I can simply use inheritance and still control tests. bq. Otherwise skimmed the rest.. Where are the tests? Tests are still missing indeed. Just a design sketch at the moment. If the approach seems sane, I'll add unit tests. Thanks for the feedback [~stack] > Spooling BufferedMutator > ------------------------ > > Key: HBASE-17018 > URL: https://issues.apache.org/jira/browse/HBASE-17018 > Project: HBase > Issue Type: New Feature > Reporter: Joep Rottinghuis > Attachments: HBASE-17018.master.001.patch, > HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements > for fault tolerant writer.pdf > > > For Yarn Timeline Service v2 we use HBase as a backing store. > A big concern we would like to address is what to do if HBase is > (temporarily) down, for example in case of an HBase upgrade. > Most of the high volume writes will be mostly on a best-effort basis, but > occasionally we do a flush. Mainly during application lifecycle events, > clients will call a flush on the timeline service API. In order to handle the > volume of writes we use a BufferedMutator. When flush gets called on our API, > we in turn call flush on the BufferedMutator. > We would like our interface to HBase be able to spool the mutations to a > filesystems in case of HBase errors. If we use the Hadoop filesystem > interface, this can then be HDFS, gcs, s3, or any other distributed storage. > The mutations can then later be re-played, for example through a MapReduce > job. -- This message was sent by Atlassian JIRA (v6.3.4#6332)