[
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15119736#comment-15119736
]
PJ Van Aeken commented on FLINK-2055:
-------------------------------------
What is the latest news on this?
Having read through the mailing thread, and the corresponding code, it seems
like the current solution is more of a workaround. I can understand the desire
for reusing what is already out there, but reusing the HBase TableOutputFormat
feels a bit like making a sacrifice. I haven't had time to thoroughly
investigate my suspicions though and am very interested to learn if anyone else
has. I am by no means an HBase expert, but based on what I think I know about
HBase, this is the sacrifice I think we're making here:
The native HBaseTableOutputFormat was built for use in batch jobs. It uses the
BufferedMutator under the hood, which as far as I understood decides to flush
based on constraints which are determined by HBase itself, such as the
cumulative size of the Puts etc. That means that while we may "write" to our
TableOutputFormat every X milliseconds, HBase will still decide on its own when
to actually flush the records. The HBase client, in order to avoid a large
amount of small files, also groups the Puts together, but in the mean time
exposes them through a component called the memstore, making them available
before the flush. I believe that by using the TableOutputFormat with the
BufferedMutator, we are skipping the memstore and therefore new Puts remain
unavailable until the flush. We could off course configure HBase to flush to
disk more frequently, but should we really do that if we have an alternative?
Now, as mentioned, I'm not sure I fully grasped the inner workings of HBase so
if I made some false assumptions, I'm sorry. But based on what I think I know
now, it seems like we're making an unnecessary sacrifice here.
> Implement Streaming HBaseSink
> -----------------------------
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
> Issue Type: New Feature
> Components: Streaming, Streaming Connectors
> Affects Versions: 0.9
> Reporter: Robert Metzger
> Assignee: Hilmi Yildirim
>
> As per :
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)