[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Enis Soztutar (JIRA) Tue, 20 Dec 2016 11:54:08 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15765091#comment-15765091
 ]


Enis Soztutar commented on HBASE-17018:
---------------------------------------

I've read through the requirements a bit, and left with feeling that you need 
persistent queue, a WAL (Kafka / Bookeeper) in front of HBase, rather than 
spooling for a much simpler design. The semantics would be that the writes will 
not fail as long as the kafka / BK is available. A single consumer would 
consume the WAL and propagate the writes via BufferedMutator. 

Of course having this q would mean that writes are doubled, but I think the 
failure conditions and simplicity makes it so much simpler to deal with to 
justify the design. Other concern is to have to depend on yet another system 
for ATS deployment. Since for ATS, there is single writer and single consumer, 
how about doing a WAL on top of HDFS, for effectively implementing BK? In that 
you would not need to depend on another system, and affectively achieve high 
availability. Writes will succeed as long as HDFS/local fs is up and running. 
You can do what HBase/Kafka does in terms of periodically rolling the WAL, and 
keeping track of where the readers are (the reader is the BuffereMutator that 
replicates to HBase), and deleting the WAL files once they are not needed 
anymore. This way, recovery is a very simple task as well (just restart). You 
probably do not even need to wait for the HDFS write to finish before sending 
the Put to HBase. You can affectively have an in-memory buffer for the recent 
WAL appends and have the consumer switch to that, if it is sufficiently 
up-to-date. Anyway a suggestion to consider.  

 

> Spooling BufferedMutator
> ------------------------
>
>                 Key: HBASE-17018
>                 URL: https://issues.apache.org/jira/browse/HBASE-17018
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Joep Rottinghuis
>         Attachments: HBASE-17018.master.001.patch, 
> HBASE-17018.master.002.patch, HBASE-17018.master.003.patch, 
> HBASE-17018.master.004.patch, 
> HBASE-17018SpoolingBufferedMutatorDesign-v1.pdf, YARN-4061 HBase requirements 
> for fault tolerant writer.pdf
>
>
> For Yarn Timeline Service v2 we use HBase as a backing store.
> A big concern we would like to address is what to do if HBase is 
> (temporarily) down, for example in case of an HBase upgrade.
> Most of the high volume writes will be mostly on a best-effort basis, but 
> occasionally we do a flush. Mainly during application lifecycle events, 
> clients will call a flush on the timeline service API. In order to handle the 
> volume of writes we use a BufferedMutator. When flush gets called on our API, 
> we in turn call flush on the BufferedMutator.
> We would like our interface to HBase be able to spool the mutations to a 
> filesystems in case of HBase errors. If we use the Hadoop filesystem 
> interface, this can then be HDFS, gcs, s3, or any other distributed storage. 
> The mutations can then later be re-played, for example through a MapReduce 
> job.
> https://reviews.apache.org/r/54882/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-17018) Spooling BufferedMutator

Reply via email to