[
https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=355247&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-355247
]
ASF GitHub Bot logged work on BEAM-1440:
----------------------------------------
Author: ASF GitHub Bot
Created on: 06/Dec/19 15:01
Start Date: 06/Dec/19 15:01
Worklog Time Spent: 10m
Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a
BigQuery source that implements iobase.BoundedSource for Python
URL: https://github.com/apache/beam/pull/9772#issuecomment-562605478
Thanks @robertwb for your comments!
> Why does this not work on the direct runners. Is it an issue of needing to
be split first?
Yes. I've already created a jira for this:
https://issues.apache.org/jira/browse/BEAM-8528
> would it make sense to implement this as an SDF instead?
My first attempt was a regular (non splittable) DoFn that triggers export
job followed by `MatchAll` and `ReadMatches` transforms. This worked, but I had
troubles with implementing the rest: waiting for query job, waiting for export
job and removing json files after reading. Using Source API turned out to be
simpler.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 355247)
Time Spent: 16.5h (was: 16h 20m)
> Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK
> ------------------------------------------------------------------------------
>
> Key: BEAM-1440
> URL: https://issues.apache.org/jira/browse/BEAM-1440
> Project: Beam
> Issue Type: New Feature
> Components: sdk-py-core
> Reporter: Chamikara Madhusanka Jayalath
> Assignee: Kamil Wasilewski
> Priority: Major
> Time Spent: 16.5h
> Remaining Estimate: 0h
>
> Currently we have a BigQuery native source for Python SDK [1].
> This can only be used by Dataflow runner.
> We should implement a Beam BigQuery source that implements
> iobase.BoundedSource [2] interface so that other runners that try to use
> Python SDK can read from BigQuery as well. Java SDK already has a Beam
> BigQuery source [3].
> [1]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py
> [2]
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70
> [3]
> https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189
--
This message was sent by Atlassian Jira
(v8.3.4#803005)