[
https://issues.apache.org/jira/browse/BEAM-1909?focusedWorklogId=105058&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-105058
]
ASF GitHub Bot logged work on BEAM-1909:
----------------------------------------
Author: ASF GitHub Bot
Created on: 23/May/18 11:05
Start Date: 23/May/18 11:05
Worklog Time Spent: 10m
Work Description: DannyLee12 opened a new pull request #5435: [BEAM-1909]
Fix BigQuery read transform fails for DirectRunner when querying non-US regions
URL: https://github.com/apache/beam/pull/5435
Added a fix to the Issue metioned above.
Includes
---
- Add regex to get projectid, datasetid and table id from both legacy and
standard sql when project is provided
- Add regex to get datasetId and tableId from both legacy and standardsql
when project is NOT provided.
- Add ability to use project attribute that already existed on
BigQuerySource class
- If project is not provided at all, do not break the existing
distributions. Therefore, nothing happens and the tables are created in
location=None. This defaults back to US on directRunner but *should* work when
running in dataflow.
Cavets
---
- I have not updated the unit tests, if required, please point me in the
right direction.
Confirmation
---
```python
...beam.io.BigQuerySource(
query="""SELECT VehicleId
FROM [cartrack.vehicle_info]""",
use_standard_sql=True, project='sa-taxi-edw'))
```
```
Dataset sa-taxi-edw:temp_dataset_297f3d8de4364961bf63819985e690c8 does not
exist so we will create it as temporary with location=EU
```
```python
...beam.io.BigQuerySource(
query="""SELECT VehicleId
FROM `cartrack.vehicle_info`""",
use_standard_sql=False, project='sa-taxi-edw'))
```
```
WARNING:root:Dataset
sa-taxi-edw:temp_dataset_6cf4c45c5ff54b69a60c0851aba33398 does not exist so we
will create it as temporary with location=EU
```
```python
...beam.io.BigQuerySource(
query="""SELECT VehicleId
FROM `sa-taxi-edw.cartrack.vehicle_info`""",
use_standard_sql=True))
```
```
WARNING:root:Dataset
sa-taxi-edw:temp_dataset_16f490456e744fa78278b2cd95fe3c5b does not exist so we
will create it as temporary with location=EU
```
```python
...beam.io.BigQuerySource(
query="""SELECT VehicleId
FROM [sa-taxi-edw:cartrack.vehicle_info]""",
use_standard_sql=False))
```
```
WARNING:root:Dataset
sa-taxi-edw:temp_dataset_2cab90333b8049aab4d2adbae5fc65e0 does not exist so we
will create it as temporary with location=EU
```
In the instance that the project is provided neither in the query nor as an
argument, the location is set to None. In this instance, the location defaults
to US when using the DirectRunner.
```
WARNING:root:Dataset
sa-taxi-edw:temp_dataset_8def137665b94f1fb128a23edcf0aa19 does not exist so we
will create it as temporary with location=None
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 105058)
Time Spent: 1.5h (was: 1h 20m)
> BigQuery read transform fails for DirectRunner when querying non-US regions
> ---------------------------------------------------------------------------
>
> Key: BEAM-1909
> URL: https://issues.apache.org/jira/browse/BEAM-1909
> Project: Beam
> Issue Type: Bug
> Components: sdk-py-core
> Reporter: Chamikara Jayalath
> Priority: Major
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> See:
> http://stackoverflow.com/questions/42135002/google-dataflow-cannot-read-and-write-in-different-locations-python-sdk-v0-5-5/42144748?noredirect=1#comment73621983_42144748
> This should be fixed by creating the temp dataset and table in the correct
> region.
> cc: [~sb2nov]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)