[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Joep Rottinghuis (JIRA) Tue, 13 Dec 2016 17:36:12 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15746879#comment-15746879
 ]


Joep Rottinghuis commented on HBASE-17018:
------------------------------------------

Interesting idea about using attributes on the mutation itself. That wouldn't 
mess with the way the BufferedMutatorImpl would deal with them? I don't want 
those attributes to be transmitted to HBase, that would be a waste.
I'll read up more, but upon first inspection I should be able to stash 
flushCount, submit- and completion times in a byte[]. Putting a flushLatch 
there would be harder. I'd have to think if that can be stashed and 
communicated in a different way. The other impact is that I'd have to peel 
apart a list of mutations and set attributes on each. Right now a submission 
maintains a List of Mutations so that they can be added to BufferedMutatorImpl 
in batch.

bq. This will not be committed to hbase? It'll be part of timeline v2?
Is that a question or statement? ;) If the HBase community is interested to 
have this part of HBase, that would be great and I'll continue the code in 
place. If not, then I'll move this to yarn.

bq. The bulk of the change in BufferedMutatorParams is unrelated. You want to 
do a patch w/ just the changes to hbase core removing the non-changes: i.e. in 
BufferedMutatorParams only change should be the clone method addition, not the 
reformatting of javadoc.
Reformatting javadoc wasn't intended. I'll remove that. In fact, I'll file a 
separate sub-jira to just add the clone method to the BufferedMutatorParams to 
separate out that concern.

bq. Is FilesystemMutationSpooler still TODO? Is it needed? There doesn't seem 
to be much filesystem-ey about FilesystemMutationSpooler, at least just yet.
Indeed this is still completely an empty template. Actual implementation still 
open. I didn't want to go too far with implementation, just sketch out enough 
to make the design clear to get feedback on that. I started with the diagram 
and a description, but as I was thinking through more details the design had to 
be tweaked. I figured POC code would do best job in describing the design.

With respect to public boolean shouldSpool() indeed, the code right now it a 
bit more verbose than needed. I'll collapse to the simple format if indeed I 
don't need to keep track of the max successful flushCount. I need to add actual 
tests before I can get those details added.

bq. If SpoolingBufferedMutatorCoordinator Interface a bit over the top? Is 
there ever going to be another type of cooridinator implementation?
Yes indeed and no probably not. I started with this thinking I needed to make 
it pluggable for testing. But you're right that no interface is needed there, I 
can simply use inheritance and still control tests.

bq. Otherwise skimmed the rest.. Where are the tests?
Tests are still missing indeed. Just a design sketch at the moment. If the 
approach seems sane, I'll add unit tests.

Thanks for the feedback [~stack]



> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Reply via email to