Yohei Onishi created AIRFLOW-3571: ------------------------------------- Summary: GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed Key: AIRFLOW-3571 URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 Project: Apache Airflow Issue Type: Bug Components: contrib Affects Versions: 1.10.0 Reporter: Yohei Onishi
I am using the following service in asia-northeast1-c zone. * GCS: asia-northeast1-c * BigQuery dataset and table: asia-northeast1-c * Composer: asia-northeast1-c My task created by GoogleCloudStorageToBigQueryOperator succeeded to uploading CSV file from a GCS bucket to a BigQuery table but the task was failed due to the following error. {code:java} [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] {discovery.py:871} INFO - URL being requested: GET https://www.googleapis.com/bigquery/v2/projects/fr-stg-datalake/jobs/job_QQE9TDEu88mfdw_fJHHEo9FtjXja?alt=json [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404) Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio jobId=self.running_job_id).execute( File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe return wrapped(*args, **kwargs File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut raise HttpError(resp, content, uri=self.uri googleapiclient.errors.HttpError: <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json returned "Not found: Job my-project:job_abc123" During handling of the above exception, another exception occurred Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas result = task_copy.execute(context=context File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line 237, in execut time_partitioning=self.time_partitioning File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 951, in run_loa return self.run_with_configuration(configuration File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1003, in run_with_configuratio err.resp.status Exception: ('BigQuery job status check failed. Final error was: %s', 404 {code} The task failed to find a job {color:#FF0000}fmy-project:job_abc123{color} but the correct job id is{color:#FF0000} my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, not actual id.) I suppose the operator does not treat zone properly. {code:java} $ bq show -j my-project:asia-northeast1:job_abc123 Job my-project:asia-northeast1:job_abc123 Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels ---------- --------- ----------------- ---------- -------------------------------------------------------------- ----------------- -------------- -------------- -------- load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)