Reading from BigQuery on portable runners in Python SDK

Kamil Wasilewski Tue, 01 Oct 2019 02:17:37 -0700

Hi all,

At the moment, we have a BigQuery native source for Python SDK, which can
be used only by Dataflow runner. Consequently, it doesn't work on portable
runners, such as Flink.


Recently I have written a prototypical source which implements
iobase.BoundedSource, so that other runners can read from BigQuery as well.
It works the same way as in Java SDK [1], which means that it exports
BigQuery table to JSON and returns TextSource objects in the split() call.
However, it has the following problems:
- it doesn't work on Direct runner,
- its API is highly experimental.

This is where my question begins. What should we do in order to provide
support for reading from BigQuery on other runners than Dataflow? Do you
think it's fine to continue working on the source I described? Or maybe it
should be done in an entirely different way (not by exporting tables to
JSON)?

Thanks,
Kamil

[1]
https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java

Reading from BigQuery on portable runners in Python SDK

Reply via email to