Boris Shilov created BEAM-10172:
-----------------------------------
Summary: BigQuerySource external data source support in non-US
regions
Key: BEAM-10172
URL: https://issues.apache.org/jira/browse/BEAM-10172
Project: Beam
Issue Type: Bug
Components: io-py-gcp
Environment: DirectRunner
Reporter: Boris Shilov
I am attempting to query an [external data
source|https://cloud.google.com/bigquery/external-data-sources], a MySQL
database that is exposed via the BigQuery API, located in the EU region. I have
the following format query string:
{code:python}
query = """
SELECT *
FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source",
"SELECT * FROM my schema.mytable;");
"""
{code}
And the following pipeline instantiation:
{code:python}
pcoll = p | "Load " + name >> beam.io.Read(
beam.io.BigQuerySource(query=query, use_standard_sql=True)
)
{code}
When run this, I see the following output:
{code:python}
WARNING:root:Dataset
my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist so
we will create it as temporary with location=None
ERROR:root:Exception at bundle
<apache_beam.runners.direct.bundle_factory._Bundle object at 0x127124640>, due
to an exception.
Traceback (most recent call last):
File
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line
345, in call
finish_state)
File
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", line
385, in attempt_call
result = evaluator.finish_bundle()
File
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
line 323, in finish_bundle
bundles = _read_values_to_bundles(reader)
File
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
line 310, in _read_values_to_bundles
read_result = [GlobalWindows.windowed_value(e) for e in reader]
File
"venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
line 310, in <listcomp>
read_result = [GlobalWindows.windowed_value(e) for e in reader]
File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 937, in __iter__
flatten_results=self.flatten_results):
File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 710, in run_query
page_token, location=location)
File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 209,
in wrapper
return fun(*args, **kwargs)
File "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
line 384, in _get_query_results
response = self.client.jobs.GetQueryResults(request)
File
"venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
line 312, in GetQueryResults
config, request, global_params=global_params)
File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
731, in _RunMethod
return self.ProcessHttpResponse(method_config, http_response, request)
File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
737, in ProcessHttpResponse
self.__ProcessHttpResponse(method_config, http_response, request))
File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line
604, in __ProcessHttpResponse
http_response, method_config=method_config, request=request)
apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing
<https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>:
response: <{'vary': 'Origin, X-Origin, Referer', 'content-type':
'application/json; charset=UTF-8', 'date': 'Tue, 02 Jun 2020 11:29:27 GMT',
'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0',
'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff',
'alt-svc': 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443";
ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443";
ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443";
ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"',
'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354',
'-content-encoding': 'gzip'}>, content <{
"error": {
"code": 400,
"message": "Cannot read and write in different locations: source: EU,
destination: US",
"errors": [
{
"message": "Cannot read and write in different locations: source: EU,
destination: US",
"domain": "global",
"reason": "invalid"
}
],
"status": "INVALID_ARGUMENT"
}
}
{code}
Which likely indicates to me that the logic Beam uses to impute the zone in
which to create the temporary dataset fails when confronted with the special
syntax for external queries. Therefore it seems like the zone should be exposed
as a parameter of BigQuerySource.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)