spencerhuang opened a new issue, #64205: URL: https://github.com/apache/airflow/issues/64205
### Description Using Airflow 3.1.8 Hi all, I've been experimenting with Kafka Message Queue Trigger. As in CDC Debezium -> Kafka -> AssetWatcher + Kafka Message Queue Trigger -> AssetEvent -> DAG It's somewhat disheartening. One day in, I noticed that AssetEvent table and Kafka's _consumer_offsets are starting to diverge. Stale AssetEvent rows in Airflow's Postgres Metastore were separate from Kafka offsets. Resetting Kafka consumer offsets had no effect on these already-queued events. Each stale event consumed a DAG run, blocking the next actual event from being processed within the timeout. It's likely caused by a network partition. Is this a low probability scenario, or is this a practical concern? Most business would require Strict Exactly-Once. My understanding is that this is not a matter of If, but how to handle the fallout when it happens. Is there a way to improve upon this, any ideas? ### Use case/motivation A way to detect drift or divergence and a contingency/approach to re-align with Kafka. ### Related issues _No response_ ### Are you willing to submit a PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
