If anyone is interested, here is a link to my code:
https://github.com/kamilwu/beam/tree/bounded-source-for-bq

On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski <
kamil.wasilew...@polidea.com> wrote:

> Hi all,
>
> At the moment, we have a BigQuery native source for Python SDK, which can
> be used only by Dataflow runner. Consequently, it doesn't work on portable
> runners, such as Flink.
>
> Recently I have written a prototypical source which implements
> iobase.BoundedSource, so that other runners can read from BigQuery as well.
> It works the same way as in Java SDK [1], which means that it exports
> BigQuery table to JSON and returns TextSource objects in the split() call.
> However, it has the following problems:
> - it doesn't work on Direct runner,
> - its API is highly experimental.
>
> This is where my question begins. What should we do in order to provide
> support for reading from BigQuery on other runners than Dataflow? Do you
> think it's fine to continue working on the source I described? Or maybe it
> should be done in an entirely different way (not by exporting tables to
> JSON)?
>
> Thanks,
> Kamil
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java
>

Reply via email to