spencerhuang opened a new issue, #64205:
URL: https://github.com/apache/airflow/issues/64205

   ### Description
   
   Using Airflow 3.1.8
   
   Hi all, 
   
   I've been experimenting with Kafka Message Queue Trigger. As in CDC Debezium 
-> Kafka -> AssetWatcher +  Kafka Message Queue Trigger -> AssetEvent -> DAG 
   
   It's somewhat disheartening. One day in, I noticed that AssetEvent table and 
Kafka's _consumer_offsets are starting to diverge. Stale AssetEvent rows in 
Airflow's Postgres Metastore were separate from Kafka offsets. Resetting Kafka 
consumer offsets had no effect on these already-queued events. Each stale event 
consumed a DAG run, blocking the next actual event from being processed within 
the timeout. It's likely caused by a network partition.
   
   Is this a low probability scenario, or is this a practical concern? Most 
business would require Strict Exactly-Once. My understanding is that this is 
not a matter of If, but how to handle the fallout when it happens. Is there a 
way to improve upon this, any ideas?
   
   ### Use case/motivation
   
   A way to detect drift or divergence and a contingency/approach to re-align 
with Kafka.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to