[
https://issues.apache.org/jira/browse/FLINK-2055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15126231#comment-15126231
]
PJ Van Aeken edited comment on FLINK-2055 at 2/1/16 2:11 PM:
-------------------------------------------------------------
Indeed the example that you described uses the native client API which I think
is the way to go. Unfortunately, HTable is now deprecated so the examples are
outdated. In the link to the mailing list (see the issue description), it is
suggested to now use the write method on DataStream combined with
TableOutputFormat.
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29
What I am proposing instead is to make a SinkFunction (like we have for Flume
for instance) that uses the new HBase client API's, similar to how the example
you referred to used to work, rather than using this TableOutputFormat which as
far as I understand buffers requests on the client side based on some internal
heuristics, as per the HBase documentation:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html
EDIT: There appears to be a version mismatch which is why we are not seeing the
same problems. Turns out my assumptions are not true in version 0.98x, I am
unsure about 1.x for now and its definitely true for 2.x which is in snapshot
currently. So the inner workings of the TableOutputFormat have changed in
recent versions, which introduces the problem I have described.
was (Author: vanaepi):
Indeed the example that you described uses the native client API which I think
is the way to go. Unfortunately, HTable is now deprecated so the examples are
outdated. In the link to the mailing list (see the issue description), it is
suggested to now use the write method on DataStream combined with
TableOutputFormat.
https://ci.apache.org/projects/flink/flink-docs-master/api/java/org/apache/flink/streaming/api/datastream/DataStream.html#write%28org.apache.flink.api.common.io.OutputFormat,%20long%29
What I am proposing instead is to make a SinkFunction (like we have for Flume
for instance) that uses the new HBase client API's, similar to how the example
you referred to used to work, rather than using this TableOutputFormat which as
far as I understand buffers requests on the client side based on some internal
heuristics, as per the HBase documentation:
https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/BufferedMutator.html
> Implement Streaming HBaseSink
> -----------------------------
>
> Key: FLINK-2055
> URL: https://issues.apache.org/jira/browse/FLINK-2055
> Project: Flink
> Issue Type: New Feature
> Components: Streaming, Streaming Connectors
> Affects Versions: 0.9
> Reporter: Robert Metzger
> Assignee: Hilmi Yildirim
>
> As per :
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Write-Stream-to-HBase-td1300.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)