[
https://issues.apache.org/jira/browse/BEAM-5457?focusedWorklogId=146359&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-146359
]
ASF GitHub Bot logged work on BEAM-5457:
----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Sep/18 12:26
Start Date: 21/Sep/18 12:26
Worklog Time Spent: 10m
Work Description: joar opened a new pull request #6463: [BEAM-5457] Make
BigQuerySource work for the EU
URL: https://github.com/apache/beam/pull/6463
... among others.
This adds a solution for the issue where location=None causes the
temporary dataset to be created in the US when running through
DirectRunner by accepting a `location` argument in `BigQuerySource`.
------------------------
Follow this checklist to help us incorporate your contribution quickly and
easily:
- [ ] Format the pull request title like `[BEAM-XXX] Fixes bug in
ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA
issue, if applicable. This will automatically link the pull request to the
issue.
- [ ] If this contribution is large, please file an Apache [Individual
Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
It will help us expedite review of your Pull Request if you tag someone
(e.g. `@username`) to look at it.
Post-Commit Tests Status (on master branch)
------------------------------------------------------------------------------------------------
Lang | SDK | Apex | Dataflow | Flink | Gearpump | Samza | Spark
--- | --- | --- | --- | --- | --- | --- | ---
Go | [](https://builds.apache.org/job/beam_PostCommit_Go_GradleBuild/lastCompletedBuild/)
| --- | --- | --- | --- | --- | ---
Java | [](https://builds.apache.org/job/beam_PostCommit_Java_GradleBuild/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Apex_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Gearpump_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza_Gradle/lastCompletedBuild/)
| [](https://builds.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark_Gradle/lastCompletedBuild/)
Python | [](https://builds.apache.org/job/beam_PostCommit_Python_Verify/lastCompletedBuild/)
| --- | [](https://builds.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)
</br> [](https://builds.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/)
| --- | --- | --- | ---
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 146359)
Time Spent: 10m
Remaining Estimate: 0h
> BigQuerySource(query=...) in DirectRunner creates temp dataset in the wrong
> location
> ------------------------------------------------------------------------------------
>
> Key: BEAM-5457
> URL: https://issues.apache.org/jira/browse/BEAM-5457
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Affects Versions: 2.6.0
> Reporter: Joar Wandborg
> Assignee: Ahmet Altay
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> I'm in the EU, if I have a
>
> {code:java}
> BigQuerySource(
> query="SELECT x, y FROM `my-other-project.mydataset.my_european_table`",
> project="myproject",
> use_standard_sql=True
> ){code}
> And then run the Pipeline through the DirectRunner I get the following
> warning and error:
> {noformat}
> 2018-09-21 11:39:52,620 WARNING root create_temporary_dataset
> Dataset myproject:temp_dataset_0bbb28f014a24225b668a67341f4f71e does not
> exist so we will create it as temporary with location=None {noformat}
> {noformat}
> HttpBadRequestError: HttpError accessing
> <https://www.googleapis.com/bigquery/v2/projects/myproject/queries/xyz123?alt=json&maxResults=10000>:
> response: <{'status': '400', 'content-length': '354', 'x-xss-protection':
> '1; mode=block', 'x-content-type-options': 'nosniff', 'transfer-encoding':
> 'chunked', 'vary': 'Origin, X-Origin, Referer', 'server': 'ESF',
> '-content-encoding': 'gzip', 'cache-control': 'private', 'date': 'Fri, 21 Sep
> 2018 09:39:55 GMT', 'x-frame-options': 'SAMEORIGIN', 'alt-svc': 'quic=":443";
> ma=2592000; v="44,43,39,35"', 'content-type': 'application/json;
> charset=UTF-8'}>, content <{
> "error": {
> "code": 400,
> "message": "Cannot read and write in different locations: source: EU,
> destination: US",
> "errors": [
> {
> "message": "Cannot read and write in different locations: source: EU,
> destination: US",
> "domain": "global",
> "reason": "invalid"
> }
> ],
> "status": "INVALID_ARGUMENT"
> }
> {noformat}
> There's a TODO in the code that looks very related:
> [https://github.com/apache/beam/blob/d691a86b8fd082efd0fd71c3cb58b7d61442717d/sdks/python/apache_beam/io/gcp/bigquery.py#L665|https://github.com/apache/beam/blob/d691a86b8fd082efd0fd71c3cb58b7d61442717d/sdks/python/apache_beam/io/gcp/bigquery.py#L665,]
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)