If anyone is interested, here is a link to my code: https://github.com/kamilwu/beam/tree/bounded-source-for-bq
On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > Hi all, > > At the moment, we have a BigQuery native source for Python SDK, which can > be used only by Dataflow runner. Consequently, it doesn't work on portable > runners, such as Flink. > > Recently I have written a prototypical source which implements > iobase.BoundedSource, so that other runners can read from BigQuery as well. > It works the same way as in Java SDK [1], which means that it exports > BigQuery table to JSON and returns TextSource objects in the split() call. > However, it has the following problems: > - it doesn't work on Direct runner, > - its API is highly experimental. > > This is where my question begins. What should we do in order to provide > support for reading from BigQuery on other runners than Dataflow? Do you > think it's fine to continue working on the source I described? Or maybe it > should be done in an entirely different way (not by exporting tables to > JSON)? > > Thanks, > Kamil > > [1] > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java >