[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Sangjin Lee (JIRA) Wed, 21 Dec 2016 15:26:18 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768468#comment-15768468
 ]


Sangjin Lee commented on HBASE-17018:
-------------------------------------

Your suggestion is interesting [~enis]. Thanks for the idea.

In addition to what Joep mentioned above, I do worry about the capacity 
requirement a dual-writing system would have. It would essentially double the 
hdfs requirement for the storage, and at large scale it would add up to a 
meaningful amount.

Also, how would a reader work in the case where the data made it into hdfs but 
not into hbase (e.g. hbase cluster was down for a while for an upgrade)? Would 
the reader still query hbase only and return no data if hbase is missing the 
data? If we want to address that situation, we're putting back the unspooling 
(migrating missing data from the backup location to hbase). I'm just trying to 
round out the idea... Thanks!

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/
> For design of SpoolingBufferedMutatorImpl see 
> https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Reply via email to