[ 
https://issues.apache.org/jira/browse/BEAM-2955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16165523#comment-16165523
 ] 

Solomon Duskis commented on BEAM-2955:
--------------------------------------

Chamikra: HBaseIO will have to be extended or wrapped.  Cloud Bigtable needs 
slightly different configuration options, has a different way to calculate 
estimated sizes, and needs templating.  The interface would essentially be the 
same whether we leverage HBaseIO or BigtableIO.  The BigtableIO wrapper that I 
wrote was 271 lines of code.  

I'll create a PR for the BigtableIO wrapper in the Beam github project, since 
the code is already written.
I'll also create a PR for an extension of HBaseIO.

That way, we can compare the two options.

> Create a Cloud Bigtable HBase connector
> ---------------------------------------
>
>                 Key: BEAM-2955
>                 URL: https://issues.apache.org/jira/browse/BEAM-2955
>             Project: Beam
>          Issue Type: New Feature
>          Components: sdk-java-gcp
>            Reporter: Solomon Duskis
>            Assignee: Solomon Duskis
>
> The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
> different repo for awhile. Recently, we did some reworking of the Cloud 
> Bigtable client that would allow it to better coexist in the Beam ecosystem, 
> and we also released a Beam connector in our repository that exposes HBase 
> idioms rather than the Protobuf idioms of BigtableIO.  More information about 
> the customer experience of the HBase connector can be found here: 
> [https://cloud.google.com/bigtable/docs/dataflow-hbase].
> The Beam repo is a much better place to house a Cloud Bigtable HBase 
> connector.  There are a couple of ways we can implement this new connector:
> # The CBT connector depends on artifacts in the io/hbase maven project.  We 
> can create a new extend HBaseIO for the purposes of CBT.  We would have to 
> add some features to HBaseIO to make that work (dynamic rebalancing, and a 
> way for HBase and CBT's size estimation models to coexist)
> # The BigtableIO connector works well, and we can add an adapter layer on top 
> of it.  I have a proof of concept of it here: 
> [https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
> # We can build a separate CBT HBase connector.
> I'm happy to do the work.  I would appreciate some guidance and discussion 
> about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to