radhwene opened a new issue, #59408:
URL: https://github.com/apache/airflow/issues/59408

   ### Apache Airflow version
   
   2.11.0
   
   ### If "Other Airflow 2/3 version" selected, which one?
   
   _No response_
   
   ### What happened?
   
   When using BigQuery operators in Apache Airflow (for example 
`BigQueryInsertJobOperator`) to run DML statements (`UPDATE`, `DELETE`, 
`MERGE`), tasks fail if the target table still contains rows in the **BigQuery 
Streaming buffer.**
   
   This behavior is expected and documented on the BigQuery side, but Airflow 
currently retries the task without any awareness of the streaming buffer state, 
which leads to repeated failures and fragile pipelines.
   
   This issue is especially common when operating on BigQuery tables that are 
**continuously populated via streaming** (CDC pipelines, Dataflow jobs, 
BigQuery Storage Write API, etc.)
   
   ### What you think should happen instead?
   
   The Sensor-based approach aligns with Airflow’s explicit and composable 
design philosophy
   and avoids changing the behavior of existing operators.
   
   It provides a clear and reusable mechanism for users to handle this 
documented BigQuery
   limitation in a controlled and non-blocking way.
   
   ### How to reproduce
   
   
   
   1. Create a BigQuery table that is continuously populated via streaming  
      (for example using the BigQuery Storage Write API, Dataflow, or 
`tabledata.insertAll`).
   
   2. Ensure that the table contains rows in the streaming buffer  
      (this can be verified via the BigQuery Tables API: the `streamingBuffer` 
field is present).
   
   3. Create an Airflow DAG using the Google provider with a BigQuery operator, 
for example:
   
      - `BigQueryInsertJobOperator`
      - executing a DML statement such as `UPDATE`, `DELETE`, or `MERGE`
      - targeting the streaming table
   
   4. Trigger the DAG while the streaming buffer is still present.
   
   5. Observe that the BigQuery job fails with an error similar to:
   UPDATE or DELETE statement would affect rows in the streaming buffer, which 
is not supported
   6. Observe that Airflow retries the task without checking the streaming 
buffer state, causing repeated failures
   until the buffer is eventually flushed.
   
   ---
   
   ### Expected result
   
   Airflow should provide a mechanism to optionally detect the presence of the 
BigQuery streaming buffer
   and wait (or defer/reschedule) until it is flushed before executing the DML 
job.
   
   
   
   ### Operating System
   
   Linuex
   
   ### Versions of Apache Airflow Providers
   
   Apache Airflow 2.11.x
   GCP cloud composer 2 
   
   ### Deployment
   
   Google Cloud Composer
   
   ### Deployment details
   
   _No response_
   
   ### Anything else?
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [x] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to