Pablo Estrada created BEAM-6684:
-----------------------------------
Summary: BigQueryIO: Unable to create dataset "Location unknown is
not yet publicly available
Key: BEAM-6684
URL: https://issues.apache.org/jira/browse/BEAM-6684
Project: Beam
Issue Type: Improvement
Components: io-java-gcp
Affects Versions: 2.10.0
Reporter: Pablo Estrada
Assignee: Pablo Estrada
My understanding is that BigQueryIO runs the query, writes the output to a temp
dataset, and then extracts the temp dataset to GCS. This means the location of
the temp dataset (if not manually set) is determined by the tables referenced
in the query. This is confirmed in the source code for BigQueryIO:
https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111
So I would expect that the temp dataset should also be created in the US
location, or default to the US. Instead, it appears to be defaulting to
"unknown" (at least some of the time), therefore causing the whole Dataflow job
to fail.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)