Yes this is something we wanted to do for sometime but could not prioritize due to other high priority work. JIRA is https://issues.apache.org/jira/browse/BEAM-1440.
Note that BigQuery sources have many moving parts and Java BigQuery source [1] is one of the most complicated sources we have. So I suggest following the Java implementation closely when implementing the Python version. Another option will be to wait till we have Splittable DoFn for Python bounded sources which is expected to be available soon but this does not necessarily have to be the case since we'll be providing converters from BounndedSources to SDF (but pure SDF versions probably will be better in some regards). Thanks, Cham [1] https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L546 On Tue, Oct 1, 2019 at 8:48 AM Ahmet Altay <al...@google.com> wrote: > +Chamikara Jayalath <chamik...@google.com> and +Pablo Estrada > <pabl...@google.com> might have ideas related to this. > > On Tue, Oct 1, 2019 at 2:39 AM Kamil Wasilewski < > kamil.wasilew...@polidea.com> wrote: > >> If anyone is interested, here is a link to my code: >> https://github.com/kamilwu/beam/tree/bounded-source-for-bq >> >> On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski < >> kamil.wasilew...@polidea.com> wrote: >> >>> Hi all, >>> >>> At the moment, we have a BigQuery native source for Python SDK, which >>> can be used only by Dataflow runner. Consequently, it doesn't work on >>> portable runners, such as Flink. >>> >>> Recently I have written a prototypical source which implements >>> iobase.BoundedSource, so that other runners can read from BigQuery as well. >>> It works the same way as in Java SDK [1], which means that it exports >>> BigQuery table to JSON and returns TextSource objects in the split() call. >>> However, it has the following problems: >>> - it doesn't work on Direct runner, >>> >> > I believe DirectRunner already have an implementation for reading from BQ. > > >> - its API is highly experimental. >>> >> > Which API is highly experimental? > > >> >>> This is where my question begins. What should we do in order to provide >>> support for reading from BigQuery on other runners than Dataflow? Do you >>> think it's fine to continue working on the source I described? Or maybe it >>> should be done in an entirely different way (not by exporting tables to >>> JSON)? >>> >>> Thanks, >>> Kamil >>> >>> [1] >>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java >>> >>