kennknowles opened a new issue, #19442:
URL: https://github.com/apache/beam/issues/19442

   When using the BigQuery source with a SQL query in a pipeline, the 
"processing location" is not taken into consideration and the pipeline fails.
   
   For example, consider the following which uses `BigQuerySource` to read from 
BigQuery using some SQL. The BigQuery dataset and tables are located in 
`australia-southeast1`. The query is submitted successfully ([Beam works out 
the processing location by examining the first table referenced in the query 
and sets it 
accordingly](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L221)),
 but when Beam attempts to poll for the job status after it has been submitted, 
it fails because it doesn't set the `location` to be `australia-southeast1`, 
which is required by BigQuery:
   
    
   ```
   
   p | 'read' >> beam.io.Read(beam.io.BigQuerySource(use_standard_sql=True, 
query='SELECT * from `a_project_id.dataset_in_australia.table_in_australia`')
   ```
   
    
   ```
   
   HttpNotFoundError: HttpError accessing 
<https://www.googleapis.com/bigquery/v2/projects/a_project_id/queries/5ad9cc803baa432290b6cd0203f556d9?alt=json&maxResults=10000>:
   response: <{'status': '404', 'content-length': '328', 'x-xss-protection': 
'1; mode=block', 'x-content-type-options':
   'nosniff', 'transfer-encoding': 'chunked', 'vary': 'Origin, X-Origin, 
Referer', 'server': 'ESF', '-content-encoding':
   'gzip', 'cache-control': 'private', 'date': 'Tue, 26 Mar 2019 03:11:32 GMT', 
'x-frame-options': 'SAMEORIGIN',
   'alt-svc': 'quic=":443"; ma=2592000; v="46,44,43,39"', 'content-type': 
'application/json; charset=UTF-8'}>,
   content <{
     "error": {
       "code": 404,
       "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
      
   "errors": [
         {
           "message": "Not found: Job 
a_project_id:5ad9cc803baa432290b6cd0203f556d9",
          
   "domain": "global",
           "reason": "notFound"
         }
       ],
       "status": "NOT_FOUND"
     }
   }
   
   ```
   
    
   
   The problem can be seen/found here:
   
   
[https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571](https://github.com/apache/beam/blob/v2.11.0/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L571)
   
   
[https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L357)
   
   The location of the job (in this case `australia-southeast1`) needs to 
set/inferred (or exposed via the API), otherwise its fails.
   
    For reference, Airflow had the same bug/problem: 
[https://github.com/apache/airflow/pull/4695](https://github.com/apache/airflow/pull/4695)
   
    
   
    
   
   Imported from Jira 
[BEAM-6910](https://issues.apache.org/jira/browse/BEAM-6910). Original Jira may 
contain additional context.
   Reported by: polleyg.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to