mfjackson opened a new issue, #29301:
URL: https://github.com/apache/airflow/issues/29301

   ### Apache Airflow Provider(s)
   
   google
   
   ### Versions of Apache Airflow Providers
   
   I'm using `apache-airflow-providers-google==8.2.0`, but it looks like the 
relevant code that's causing this to occur is still in use as of `8.8.0`.
   
   ### Apache Airflow version
   
   2.3.2
   
   ### Operating System
   
   Debian (from Docker image `apache/airflow:2.3.2-python3.10`)
   
   ### Deployment
   
   Official Apache Airflow Helm Chart
   
   ### Deployment details
   
   Deployed on an EKS cluster via Helm.
   
   ### What happened
   
   The first task in one of my DAGs is to create an empty BigQuery table using 
the `BigQueryCreateEmptyTableOperator` as follows:
   
   ```python
   create_staging_table = BigQueryCreateEmptyTableOperator(
       task_id="create_staging_table",
       dataset_id="my_dataset",
       table_id="tmp_table",
       schema_fields=[
           {"name": "field_1", "type": "TIMESTAMP", "mode": "NULLABLE"},
           {"name": "field_2", "type": "INTEGER", "mode": "NULLABLE"},
           {"name": "field_3", "type": "INTEGER", "mode": "NULLABLE"}
       ],
       exists_ok=False
   )
   ```
   Note that `exists_ok=False` explicitly here, but it is also the default 
value.
   
   This task exists with a `SUCCESS` status even when `my_dataset.tmp_table` 
already exists in a given BigQuery project. The task returns the following logs:
   
   ```
   [2023-02-02, 05:52:29 UTC] {bigquery.py:875} INFO - Creating table
   [2023-02-02, 05:52:29 UTC] {bigquery.py:901} INFO - Table 
my_dataset.tmp_table already exists.
   [2023-02-02, 05:52:30 UTC] {taskinstance.py:1395} INFO - Marking task as 
SUCCESS. dag_id=my_fake_dag, task_id=create_staging_table, 
execution_date=20230202T044000, start_date=20230202T055229, 
end_date=20230202T055230
   [2023-02-02, 05:52:30 UTC] {local_task_job.py:156} INFO - Task exited with 
return code 0
   ```
   
   ### What you think should happen instead
   
   Setting `exists_ok=False` should raise an exception and exit the task with a 
`FAILED` status if the table being created already exists in BigQuery.
   
   ### How to reproduce
   
   1. Deploy Airflow 2.3.2 running Python 3.10 in some capacity
   2. Ensure `apache-airflow-providers-google==8.2.0` (or 8.8.0, as I don't 
believe the issue has been fixed) is installed on the deployment.
   3. Set up a GCP project and create a BigQuery dataset.
   4. Create an empty BigQuery table with a schema.
   5. Create a DAG that uses the `BigQueryCreateEmptyTableOperator` to create a 
new BigQuery table.
   6. Run the DAG from Step 5 on the Airflow instance deployed in Step 1.
   7.  Observe the task's status.
   
   ### Anything else
   
   I believe the silent failure may be occurring 
[here](https://github.com/apache/airflow/blob/main/airflow/providers/google/cloud/operators/bigquery.py#L1377),
 as the `except` statement results in a log output, but doesn't actually raise 
an exception or change a state that would make the task fail.
   
   If this is in fact the case, I'd be happy to submit a PR, but appreciate any 
input as to any error-handling standards/consistencies that this provider 
package maintains.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to