[ 
https://issues.apache.org/jira/browse/BEAM-10172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17548626#comment-17548626
 ] 

Danny McCormick commented on BEAM-10172:
----------------------------------------

This issue has been migrated to https://github.com/apache/beam/issues/20249

> BigQuerySource external data source support in non-US regions
> -------------------------------------------------------------
>
>                 Key: BEAM-10172
>                 URL: https://issues.apache.org/jira/browse/BEAM-10172
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>    Affects Versions: 2.18.0
>         Environment: DirectRunner, DataflowRunner
>            Reporter: Boris Shilov
>            Priority: P3
>
> I am attempting to query an [external data 
> source|https://cloud.google.com/bigquery/external-data-sources], a MySQL 
> database that is exposed via the BigQuery API, located in the EU region. I 
> have the following format query string:
> {code:python}
> query = """
>     SELECT * 
>     FROM EXTERNAL_QUERY("my-project-one-253518.eu.external-source", 
>     "SELECT * FROM my schema.mytable;");
>     """
> {code}
> And the following pipeline instantiation:
> {code:python}
>     pcoll = p | "Load " + name >> beam.io.Read(
>         beam.io.BigQuerySource(query=query, use_standard_sql=True)
>     )
> {code}
> When run this, I see the following output:
> {code:python}
> WARNING:root:Dataset 
> my-project-two:temp_dataset_f07dd1398b0443edaa67c360f5be6958 does not exist 
> so we will create it as temporary with location=None
> ERROR:root:Exception at bundle 
> <apache_beam.runners.direct.bundle_factory._Bundle object at 0x127124640>, 
> due to an exception.
>  Traceback (most recent call last):
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", 
> line 345, in call
>     finish_state)
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/executor.py", 
> line 385, in attempt_call
>     result = evaluator.finish_bundle()
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
>  line 323, in finish_bundle
>     bundles = _read_values_to_bundles(reader)
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
>  line 310, in _read_values_to_bundles
>     read_result = [GlobalWindows.windowed_value(e) for e in reader]
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/runners/direct/transform_evaluator.py",
>  line 310, in <listcomp>
>     read_result = [GlobalWindows.windowed_value(e) for e in reader]
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 
> 937, in __iter__
>     flatten_results=self.flatten_results):
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 
> 710, in run_query
>     page_token, location=location)
>   File "venv/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 
> 209, in wrapper
>     return fun(*args, **kwargs)
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 
> 384, in _get_query_results
>     response = self.client.jobs.GetQueryResults(request)
>   File 
> "venv/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
>  line 312, in GetQueryResults
>     config, request, global_params=global_params)
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
> 731, in _RunMethod
>     return self.ProcessHttpResponse(method_config, http_response, request)
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
> 737, in ProcessHttpResponse
>     self.__ProcessHttpResponse(method_config, http_response, request))
>   File "venv/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 
> 604, in __ProcessHttpResponse
>     http_response, method_config=method_config, request=request)
> apitools.base.py.exceptions.HttpBadRequestError: HttpError accessing 
> <https://www.googleapis.com/bigquery/v2/projects/my-project-two/queries/636272a8e026434d85200b3f14f719ed?alt=json&location=US&maxResults=10000>:
>  response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 
> 'application/json; charset=UTF-8', 'date': 'Tue, 02 Jun 2020 11:29:27 GMT', 
> 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 
> 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 
> 'alt-svc': 'h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; 
> ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; 
> ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; 
> ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"', 
> 'transfer-encoding': 'chunked', 'status': '400', 'content-length': '354', 
> '-content-encoding': 'gzip'}>, content <{
>   "error": {
>     "code": 400,
>     "message": "Cannot read and write in different locations: source: EU, 
> destination: US",
>     "errors": [
>       {
>         "message": "Cannot read and write in different locations: source: EU, 
> destination: US",
>         "domain": "global",
>         "reason": "invalid"
>       }
>     ],
>     "status": "INVALID_ARGUMENT"
>   }
> }
> {code}
> Which likely indicates to me that the logic Beam uses to impute the zone in 
> which to create the temporary dataset fails when confronted with the special 
> syntax for external queries. Therefore it seems like the zone should be 
> exposed as a parameter of BigQuerySource.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

Reply via email to