[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HBASE-17018:
-------------------------------------
    Attachment: HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf
                HBASE-17018.master.001.patch

Based on feedback, I've attached a new design doc. 
This is the shared Google doc open for comments:
https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing

I've attached a sketch of what the code would look like 
(HBASE-17018.master.001.patch) to clarify what this design in a bit more 
details than is worded in the doc. It is ready for design feedback, not for 
code feedback. It has tons of TODOs and open items and is lacking any unit 
tests.
If people think this is a sensible approach I can work out the code to more 
details with unit tests, and a POC for the spooling code.

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to