Re: Reading from BigQuery on portable runners in Python SDK

Ahmet Altay Tue, 01 Oct 2019 09:12:20 -0700

+Chamikara Jayalath <chamik...@google.com> and +Pablo Estrada
<pabl...@google.com> might have ideas related to this.


On Tue, Oct 1, 2019 at 2:39 AM Kamil Wasilewski <
kamil.wasilew...@polidea.com> wrote:

> If anyone is interested, here is a link to my code:
> https://github.com/kamilwu/beam/tree/bounded-source-for-bq
>
> On Tue, Oct 1, 2019 at 11:17 AM Kamil Wasilewski <
> kamil.wasilew...@polidea.com> wrote:
>
>> Hi all,
>>
>> At the moment, we have a BigQuery native source for Python SDK, which can
>> be used only by Dataflow runner. Consequently, it doesn't work on portable
>> runners, such as Flink.
>>
>> Recently I have written a prototypical source which implements
>> iobase.BoundedSource, so that other runners can read from BigQuery as well.
>> It works the same way as in Java SDK [1], which means that it exports
>> BigQuery table to JSON and returns TextSource objects in the split() call.
>> However, it has the following problems:
>> - it doesn't work on Direct runner,
>>
>
I believe DirectRunner already have an implementation for reading from BQ.


> - its API is highly experimental.
>>
>
Which API is highly experimental?


>
>> This is where my question begins. What should we do in order to provide
>> support for reading from BigQuery on other runners than Dataflow? Do you
>> think it's fine to continue working on the source I described? Or maybe it
>> should be done in an entirely different way (not by exporting tables to
>> JSON)?
>>
>> Thanks,
>> Kamil
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQuerySourceBase.java
>>
>

Re: Reading from BigQuery on portable runners in Python SDK

Reply via email to