nathadfield opened a new issue, #32093:
URL: https://github.com/apache/airflow/issues/32093

   ### Apache Airflow version
   
   2.6.2
   
   ### What happened
   
   When using the `GCSToBigQueryOperator` in deferrable mode with an 
impersonation_chain service account which has a  default project_id that is 
different from the project_id specified in the operator arguments, a failure 
occurs. 
   
   ```
   [2023-06-23, 11:38:37 UTC] {taskinstance.py:1824} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/usr/local/lib/python3.10/site-packages/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py",
 line 447, in execute_complete
       raise AirflowException(event["message"])
   airflow.exceptions.AirflowException: 404, message='Not Found: {\n  "error": 
{\n    "code": 404,\n    "message": "Not found: Job 
king-cdmr-etl-sandbox:airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd",\n
    "errors": [\n      {\n        "message": "Not found: Job 
king-cdmr-etl-sandbox:airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd",\n
        "domain": "global",\n        "reason": "notFound"\n      }\n    ],\n    
"status": "NOT_FOUND"\n  }\n}\n', 
url=URL('https://www.googleapis.com/bigquery/v2/projects/king-cdmr-etl-sandbox/jobs/airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd')
   ```
   
   I believe this happens because, although the BigQuery job to insert data, is 
raised against `self.project_id`  in  
[_submit_job](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L303),
 when in deferrable mode it tries to find the job within the project in 
[self.hook.project_id](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/transfers/gcs_to_bigquery.py#L432).
   
   It is possible that that the default project_id assigned to the 
impersonation chain service account is different to the project_id specified to 
the operator.
   
   In the above error, you can see that the error says that it cannot find the 
job_id 
`airflow_apptweak_king_itunes_connect_channels_load_active_devices_to_bq_2023_06_22T07_00_00_00_00_4842808969d21632ecbb76ffca48aabd`
 in the project `king-cdmt-etl-sandbox`.
   
   In fact this job_id was created successfully in the project 
`king-coredatasets-sandbox`
   
   <img width="604" alt="Screenshot 2023-06-23 at 12 40 39" 
src="https://github.com/apache/airflow/assets/967119/488ea8e2-e447-46b1-814e-419402639a76";>
   
   ### What you think should happen instead
   
   I think that we should modify the call to `self.defer` to receive 
`self.project_id` rather than `self.hook.project_id`
   
   ### How to reproduce
   
   I haven't quite got the exact steps to reproduce but I will submit a PR for 
review soon.
   
   ### Operating System
   
   Debian GNU/Linux 11 (bullseye)
   
   ### Versions of Apache Airflow Providers
   
   apache-airflow-providers-google==10.0.0
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to