Pablo Estrada created BEAM-6684:
-----------------------------------

             Summary: BigQueryIO: Unable to create dataset "Location unknown is 
not yet publicly available
                 Key: BEAM-6684
                 URL: https://issues.apache.org/jira/browse/BEAM-6684
             Project: Beam
          Issue Type: Improvement
          Components: io-java-gcp
    Affects Versions: 2.10.0
            Reporter: Pablo Estrada
            Assignee: Pablo Estrada



My understanding is that BigQueryIO runs the query, writes the output to a temp 
dataset, and then extracts the temp dataset to GCS. This means the location of 
the temp dataset (if not manually set) is determined by the tables referenced 
in the query. This is confirmed in the source code for BigQueryIO: 
https://github.com/apache/beam/blob/v2.6.0/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryQuerySource.java#L111

So I would expect that the temp dataset should also be created in the US 
location, or default to the US. Instead, it appears to be defaulting to 
"unknown" (at least some of the time), therefore causing the whole Dataflow job 
to fail.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to