[
https://issues.apache.org/jira/browse/BEAM-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548626#comment-17548626
]
Danny McCormick commented on BEAM-10172:
----------------------------------------
This issue has been migrated to https://github.com/apache/beam/issues/20249
> BigQuerySource external data source support in non-US regions
> -------------------------------------------------------------
>
> Key: BEAM-10172
> URL: https://issues.apache.org/jira/browse/BEAM-10172
> Project: Beam
> Issue Type: Bug
> Components: io-py-gcp
> Affects Versions: 2.18.0
> Environment: DirectRunner, DataflowRunner
> Reporter: Boris Shilov
> Priority: P3
>
> I am attempting to query an [external data
> source|https://cloud.google.com/bigquery/external-data-sources], a MySQL
> database that is exposed via the BigQuery API, located in the EU region. I
> have the following format query string:
> {code:python}
> query = """
> SELECT *
> FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source",
> "SELECT * FROM my schema.mytable;");
> """
> {code}
> And the following pipeline instantiation:
> {code:python}
> pcoll = p | "Load " + name >> beam.io.Read(
> beam.io.BigQuerySource(query=query, use_standard_sql=True)
> )
> {code}
> When run this, I see the following output:
> {code:python}
> WARNING:root:Dataset
> my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist
> so we will create it as temporary with location=None
> ERROR:root:Exception at bundle
> <apache_beam.runners.direct.bundle_factory._Bundle object at 0x127124640>,
> due to an exception.
> Traceback (most recent call last):
> File
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py",
> line 345, in call
> finish_state)
> File
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py",
> line 385, in attempt_call
> result = evaluator.finish_bundle()
> File
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
> line 323, in finish_bundle
> bundles = _read_values_to_bundles(reader)
> File
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
> line 310, in _read_values_to_bundles
> read_result = [GlobalWindows.windowed_value(e) for e in reader]
> File
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
> line 310, in <listcomp>
> read_result = [GlobalWindows.windowed_value(e) for e in reader]
> File
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line
> 937, in __iter__
> flatten_results=self.flatten_results):
> File
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line
> 710, in run_query
> page_token, location=location)
> File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py", line
> 209, in wrapper
> return fun(*args, **kwargs)
> File
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line
> 384, in _get_query_results
> response = self.client.jobs.GetQueryResults(request)
> File
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
> line 312, in GetQueryResults
> config, request, global_params=global_params)
> File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
> 731, in _RunMethod
> return self.ProcessHttpResponse(method_config, http_response, request)
> File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
> 737, in ProcessHttpResponse
> self.__ProcessHttpResponse(method_config, http_response, request))
> File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
> 604, in __ProcessHttpResponse
> http_response, method_config=method_config, request=request)
> apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing
> <https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>:
> response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
> 'application/json; charset=UTF-8', 'date': 'Tue, 02 Jun 2020 11:29:27 GMT',
> 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0',
> 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff',
> 'alt-svc': 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443";
> ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443";
> ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443";
> ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"',
> 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354',
> '-content-encoding': 'gzip'}>, content <{
> "error": {
> "code": 400,
> "message": "Cannot read and write in different locations: source: EU,
> destination: US",
> "errors": [
> {
> "message": "Cannot read and write in different locations: source: EU,
> destination: US",
> "domain": "global",
> "reason": "invalid"
> }
> ],
> "status": "INVALID_ARGUMENT"
> }
> }
> {code}
> Which likely indicates to me that the logic Beam uses to impute the zone in
> which to create the temporary dataset fails when confronted with the special
> syntax for external queries. Therefore it seems like the zone should be
> exposed as a parameter of BigQuerySource.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)