Prathap Kumar Parvathareddy created BEAM-13141:
--------------------------------------------------
Summary: Support to submit Jobs using HBaseIO to DataflowRunner
without local access to HBase Cluster
Key: BEAM-13141
URL: https://issues.apache.org/jira/browse/BEAM-13141
Project: Beam
Issue Type: Improvement
Components: sdk-java-core
Reporter: Prathap Kumar Parvathareddy
+*Context*+
As of today HBase IO interacts with Hbase cluster while building execution
graph for validating the existence of table, calculating splits etc
https://github.com/apache/beam/blob/master/sdks/java/io/hbase/src/main/java/org/apache/beam/sdk/io/hbase/HBaseIO.java#L237
In certain scenarios dataflow jobs are launched from systems that does not have
network access to Hbase cluster during graph construction stage. but can access
only during execution time on google cloud. However due to current
implementation of local access to HbaseIO, the job can be launched only from
systems that has network access to Hbase Cluster.
*+Requirement+*
Modify HbaseIO to accept a flag (say hasLocalAccess) and if flag is set to
false defer validations , split calculation logic etc to job execution time
rather than job construction time.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)