kennknowles opened a new issue, #19442: URL: https://github.com/apache/beam/issues/19442
When using the BigQuery source with a SQL query in a pipeline, the "processing location" is not taken into consideration and the pipeline fails. For example, consider the following which uses `BigQuerySource` to read from BigQuery using some SQL. The BigQuery dataset and tables are located in `australia-southeast1`. The query is submitted successfully ([Beam works out the processing location by examining the first table referenced in the query and sets it accordingly](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221)), but when Beam attempts to poll for the job status after it has been submitted, it fails because it doesn't set the `location` to be `australia-southeast1`, which is required by BigQuery: ``` p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, query='SELECT * from `a_project_id.dataset_in_australia.table_in_australia`') ``` ``` HttpNotFoundError: HttpError accessing <https://www.googleapis.com/bigquery/v2/projects/a_project_id/queries/5ad9cc803baa432290b6cd0203f556d9?alt=json&maxResults=10000>: response: <{'status': '404', 'content-length': '328', 'x-xss-protection': '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF', '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 2019 03:11:32 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443"; ma=2592000; v="46,44,43,39"', 'content-type': 'application/json; charset=UTF-8'}>, content <{ "error": { "code": 404, "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9", "errors": [ { "message": "Not found: Job a_project_id:5ad9cc803baa432290b6cd0203f556d9", "domain": "global", "reason": "notFound" } ], "status": "NOT_FOUND" } } ``` The problem can be seen/found here: [https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571](https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571) [https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357) The location of the job (in this case `australia-southeast1`) needs to set/inferred (or exposed via the API), otherwise its fails. For reference, Airflow had the same bug/problem: [https://github.com/apache/airflow/pull/4695](https://github.com/apache/airflow/pull/4695) Imported from Jira [BEAM-6910](https://issues.apache.org/jira/browse/BEAM-6910). Original Jira may contain additional context. Reported by: polleyg. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
