[ 
https://issues.apache.org/jira/browse/HBASE-19486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niels Basjes updated HBASE-19486:
---------------------------------
    Release Note: 
The BufferedMutator now supports two settings that are used to ensure records 
do not stay too long in the buffer of a BufferedMutator. For periodically 
flushing the BufferedMutator there is now a "Timeout": "How old may the oldest 
record in the buffer be before we force a flush" and a "TimerTick": How often 
do we check if the timeout has been exceeded. Using these settings you can make 
the BufferedMutator automatically flush the write buffer if after the specified 
number of milliseconds no flush has occurred.

This is mainly useful in streaming scenarios (i.e. writing data into HBase 
using Apache Flink/Beam/Storm) where it is common (especially in a 
test/development situation) to see small unpredictable bursts of data that need 
to be written into HBase. When using the BufferedMutator till now the effect 
was that records would remain in the write buffer until the buffer was full or 
an explicit flush was triggered. In practice this would mean that the 'last few 
records' of a burst would remain in the write buffer until the next burst 
arrives filling the buffer to capacity and thus triggering a flush.

  was:
The BufferedMutator now supports a "Write buffer maximum linger" setting. Using 
this setting makes the BufferedMutator automatically flush the write buffer if 
after the specified number of milliseconds no flush has occurred.

This is mainly useful in streaming scenarios (i.e. writing data into HBase 
using Apache Flink/Beam/Storm) where it is common (especially in a 
test/development situation) to see small unpredictable bursts of data that need 
to be written into HBase. When using the BufferedMutator till now the effect 
was that records would remain in the write buffer until the buffer was full or 
an explicit flush was triggered. In practice this would mean that the 'last few 
records' of a burst would remain in the write buffer until the next burst 
arrives filling the buffer to capacity and thus triggering a flush.


>  Periodically ensure records are not buffered too long by BufferedMutator
> -------------------------------------------------------------------------
>
>                 Key: HBASE-19486
>                 URL: https://issues.apache.org/jira/browse/HBASE-19486
>             Project: HBase
>          Issue Type: Improvement
>          Components: Client
>            Reporter: Niels Basjes
>            Assignee: Niels Basjes
>         Attachments: HBASE-19486-20171212-2117.patch, 
> HBASE-19486-20171218-1229.patch, HBASE-19486-20171218-1300.patch, 
> HBASE-19486-20171219-0933.patch, HBASE-19486-20171219-1026.patch, 
> HBASE-19486-20171219-1122-trigger-qa-run.patch, 
> HBASE-19486-20171220-1612-trigger-qa-run.patch, 
> HBASE-19486-20171220-2228-trigger-qa-run.patch
>
>
> I'm working on several projects where we are doing stream / event type 
> processing instead of batch type processing. We mostly use Apache Flink and 
> Apache Beam for these projects.
> When we ingest a continuous stream of events and feed that into HBase via a 
> BufferedMutator this all works fine. The buffer fills up at a predictable 
> rate and we can make sure it flushes several times per second into HBase by 
> tuning the buffer size.
> We also have situations where the event rate is unpredictable. Some times 
> because the source is in reality a batch job that puts records into Kafka, 
> sometimes because it is the "predictable in production" application in our 
> testing environment (where only the dev triggers a handful of events).
> For these kinds of use cases we need a way to 'force' the BufferedMutator to 
> automatically flush any records in the buffer even if the buffer is not full.
> I'll put up a pull request with a proposed implementation for review against 
> the master (i.e. 3.0.0).
> When approved I would like to backport this to the 1.x and 2.x versions of 
> the client in the same (as close as possible) way.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to