[
https://issues.apache.org/jira/browse/BEAM-1909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019562#comment-16019562
]
Uwe Jugel commented on BEAM-1909:
---------------------------------
Here are my latest test results regarding this issue:
*Experiments*
# I just tried and failed to query across regions:
{code:sql}
SELECT a.user_id FROM `test_dummy_eu.user_details` a,
`test_dummy_us.user_details` b WHERE a.user_id = b.user_id
-- Error: Cannot process data across locations: EU,US
{code}
# Since we cannot query across regions, I tried to determine the single
location/region of the data source(s). Therefore, I tried to dry-run the query
and then check the location of the bq-internal temp table. However, this does
not work, as the temp table always reports {{None}}, i.e., US as location, even
if the source table is in an EU dataset.
# However, we can still *transfer the data to our temp table from the queries
own temp table using a {{CopyJob}}* that works across regions. Here is a gist
that demonstrates how to do this via the BigQuery Python SDK:
https://gist.github.com/ubunatic/29352bc2c9ddfc33163cfac47bc1e4d6
*Note*:
I believe, using a {{CopyJob}} this is the appropriate way of copying any table
to a temp table, also and especially for non-query sources, which we currently
query with a {{SELECT *}}, which may be billed to the user (\?), even if it
should be covered by the free data export quotas (see here
https://cloud.google.com/bigquery/docs/exporting-data and here:
https://cloud.google.com/bigquery/pricing#free)
*Links:*
||Description||Link||
|CopyJob (py)|
https://github.com/GoogleCloudPlatform/google-cloud-python/blob/master/bigquery/google/cloud/bigquery/job.py|
|copy job (API)| https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs|
|BQ-read in DataFlow == BQ-export|
https://cloud.google.com/bigquery/docs/exporting-data|
|free BQ-export| https://cloud.google.com/bigquery/pricing#free|
|costly(\?) "SELECT *" for non-queries|
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py|
> BigQuery read transform fails for DirectRunner when querying non-US regions
> ---------------------------------------------------------------------------
>
> Key: BEAM-1909
> URL: https://issues.apache.org/jira/browse/BEAM-1909
> Project: Beam
> Issue Type: Bug
> Components: sdk-py
> Reporter: Chamikara Jayalath
>
> See:
> http://stackoverflow.com/questions/42135002/google-dataflow-cannot-read-and-write-in-different-locations-python-sdk-v0-5-5/42144748?noredirect=1#comment73621983_42144748
> This should be fixed by creating the temp dataset and table in the correct
> region.
> cc: [~sb2nov]
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)