[ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joep Rottinghuis updated HBASE-17018:
-------------------------------------
    Attachment: HBASE-17018.master.005.patch

[~enis] are you suggesting we don't do a double-write, but write wals to HDFS 
only, and then have a separate set of "readers" replay the WALs from HDFS to 
HBase?

In that case we'd be writing tons of little WAL files to the source cluster's 
HDFS (not just the one backing HBase) in all cases, not just the case when 
HBase is bad. As Sangjin pointed out that would introduce a delay by when the 
writes are available, or else we have to keep track of high-and low watermarks, 
rotate WALs frequently or something else. I'm wondering if we are just shifting 
the complexity around.
The nice thing with the current approach is that under normal circumstances, 
the data written to HBase is ready in near-real time (only some writes are 
buffered, but we're talking about flushing once a minute).
HBase writing WALs to its own HDFS will be on a separately tuned cluster.

In any case, let me discuss that approach with other devs working on timeline 
service and see what they think.

In the meantime I'm stashing a new patch (version 5). This incorporate's 
[~sjlee0]'s suggestion to ensure that accounting flushCount and enqueueing is 
done in one synchronized block so that we avoid out of order items in the 
outbound queue. This is now moved to the coordinator. I've also added a simple 
exception handler to the coordinator and a unit test for that.
I'm not sure how much fancier we need to get with the exception handler. 

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, HBASE-17018.master.005.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/
> For design of SpoolingBufferedMutatorImpl see 
> https://docs.google.com/document/d/1GTSk1Hd887gGJduUr8ZJ2m-VKrIXDUv9K3dr4u2YGls/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to