Solomon Duskis created BEAM-2955:
------------------------------------

             Summary: Create a Cloud Bigtable HBase connector
                 Key: BEAM-2955
                 URL: https://issues.apache.org/jira/browse/BEAM-2955
             Project: Beam
          Issue Type: New Feature
          Components: sdk-java-gcp
            Reporter: Solomon Duskis
            Assignee: Chamikara Jayalath


The Cloud Bigtable (CBT) team has had a Dataflow connector maintained in a 
different repo for awhile. Recently, we did some reworking of the Cloud 
Bigtable client that would allow it to better coexist in the Beam ecosystem, 
and we also released a Beam connector in our repository that exposes HBase 
idioms rather than the Protobuf idioms of BigtableIO.  More information about 
the customer experience of the HBase connector can be found here: 
[https://cloud.google.com/bigtable/docs/dataflow-hbase].

The Beam repo is a much better place to house a Cloud Bigtable HBase connector. 
 There are a couple of ways we can implement this new connector:

# The CBT connector depends on artifacts in the io/hbase maven project.  We can 
create a new extend HBaseIO for the purposes of CBT.  We would have to add some 
features to HBaseIO to make that work (dynamic rebalancing, and a way for HBase 
and CBT's size estimation models to coexist)
# The BigtableIO connector works well, and we can add an adapter layer on top 
of it.  I have a proof of concept of it here: 
[https://github.com/sduskis/cloud-bigtable-client/tree/add_beam/bigtable-dataflow-parent/bigtable-hbase-beam].
# We can build a separate CBT HBase connector.

I'm happy to do the work.  I would appreciate some guidance and discussion 
about the right approach.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to