+Chamikara Jayalath <chamik...@google.com> and +Pablo Estrada <pabl...@google.com> might have ideas related to this.
On Tue, Oct 1, 2019 at 2:39 AM Kamil Wasilewski < kamil.wasilew...@polidea.com> wrote: > If anyone is interested, here is a link to my code: > https://github.com/kamilwu/beam/tree/bounded-source-for-bq > > On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski < > kamil.wasilew...@polidea.com> wrote: > >> Hi all, >> >> At the moment, we have a BigQuery native source for Python SDK, which can >> be used only by Dataflow runner. Consequently, it doesn't work on portable >> runners, such as Flink. >> >> Recently I have written a prototypical source which implements >> iobase.BoundedSource, so that other runners can read from BigQuery as well. >> It works the same way as in Java SDK [1], which means that it exports >> BigQuery table to JSON and returns TextSource objects in the split() call. >> However, it has the following problems: >> - it doesn't work on Direct runner, >> > I believe DirectRunner already have an implementation for reading from BQ. > - its API is highly experimental. >> > Which API is highly experimental? > >> This is where my question begins. What should we do in order to provide >> support for reading from BigQuery on other runners than Dataflow? Do you >> think it's fine to continue working on the source I described? Or maybe it >> should be done in an entirely different way (not by exporting tables to >> JSON)? >> >> Thanks, >> Kamil >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java >> >