radhwene opened a new issue, #59408:
URL: https://github.com/apache/airflow/issues/59408
### Apache Airflow version
2.11.0
### If "Other Airflow 2/3 version" selected, which one?
_No response_
### What happened?
When using BigQuery operators in Apache Airflow (for example
`BigQueryInsertJobOperator`) to run DML statements (`UPDATE`, `DELETE`,
`MERGE`), tasks fail if the target table still contains rows in the **BigQuery
Streaming buffer.**
This behavior is expected and documented on the BigQuery side, but Airflow
currently retries the task without any awareness of the streaming buffer state,
which leads to repeated failures and fragile pipelines.
This issue is especially common when operating on BigQuery tables that are
**continuously populated via streaming** (CDC pipelines, Dataflow jobs,
BigQuery Storage Write API, etc.)
### What you think should happen instead?
The Sensor-based approach aligns with Airflow’s explicit and composable
design philosophy
and avoids changing the behavior of existing operators.
It provides a clear and reusable mechanism for users to handle this
documented BigQuery
limitation in a controlled and non-blocking way.
### How to reproduce
1. Create a BigQuery table that is continuously populated via streaming
(for example using the BigQuery Storage Write API, Dataflow, or
`tabledata.insertAll`).
2. Ensure that the table contains rows in the streaming buffer
(this can be verified via the BigQuery Tables API: the `streamingBuffer`
field is present).
3. Create an Airflow DAG using the Google provider with a BigQuery operator,
for example:
- `BigQueryInsertJobOperator`
- executing a DML statement such as `UPDATE`, `DELETE`, or `MERGE`
- targeting the streaming table
4. Trigger the DAG while the streaming buffer is still present.
5. Observe that the BigQuery job fails with an error similar to:
UPDATE or DELETE statement would affect rows in the streaming buffer, which
is not supported
6. Observe that Airflow retries the task without checking the streaming
buffer state, causing repeated failures
until the buffer is eventually flushed.
---
### Expected result
Airflow should provide a mechanism to optionally detect the presence of the
BigQuery streaming buffer
and wait (or defer/reschedule) until it is flushed before executing the DML
job.
### Operating System
Linuex
### Versions of Apache Airflow Providers
Apache Airflow 2.11.x
GCP cloud composer 2
### Deployment
Google Cloud Composer
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [x] Yes I am willing to submit a PR!
### Code of Conduct
- [x] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]