[ https://issues.apache.org/jira/browse/BEAM-1440?focusedWorklogId=355248&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-355248 ]
ASF GitHub Bot logged work on BEAM-1440: ---------------------------------------- Author: ASF GitHub Bot Created on: 06/Dec/19 15:01 Start Date: 06/Dec/19 15:01 Worklog Time Spent: 10m Work Description: kamilwu commented on issue #9772: [BEAM-1440] Create a BigQuery source that implements iobase.BoundedSource for Python URL: https://github.com/apache/beam/pull/9772#issuecomment-562605478 Thanks @robertwb for your comments! > Why does this not work on the direct runners. Is it an issue of needing to be split first? Yes. I've already created a jira for this: https://issues.apache.org/jira/browse/BEAM-8528 > would it make sense to implement this as an SDF instead? My first attempt was a regular (non splittable) DoFn that triggers export job followed by `MatchAll` and `ReadMatches` transforms. This worked, but I had troubles with implementing the rest: waiting for query job, waiting for export job and removing json files after reading. Using Source API turned out to be simpler. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 355248) Time Spent: 16h 40m (was: 16.5h) > Create a BigQuery source (that implements iobase.BoundedSource) for Python SDK > ------------------------------------------------------------------------------ > > Key: BEAM-1440 > URL: https://issues.apache.org/jira/browse/BEAM-1440 > Project: Beam > Issue Type: New Feature > Components: sdk-py-core > Reporter: Chamikara Madhusanka Jayalath > Assignee: Kamil Wasilewski > Priority: Major > Time Spent: 16h 40m > Remaining Estimate: 0h > > Currently we have a BigQuery native source for Python SDK [1]. > This can only be used by Dataflow runner. > We should implement a Beam BigQuery source that implements > iobase.BoundedSource [2] interface so that other runners that try to use > Python SDK can read from BigQuery as well. Java SDK already has a Beam > BigQuery source [3]. > [1] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py > [2] > https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/iobase.py#L70 > [3] > https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1189 -- This message was sent by Atlassian Jira (v8.3.4#803005)