collinmcnulty opened a new issue, #49900:
URL: https://github.com/apache/airflow/issues/49900

   ### Description
   
   There should be an option on the AssetWatcher to respect idempotency in the 
payload of a TriggerEvent could include when yielded, so that a new DAG is only 
created when a new key shows up. This is already the case for task sensors for 
HA purposes, so the DAG creation logic surely could respect that too.
   
   This way you could make S3KeyTrigger and similar triggers work by having 
them put the name of the key that was detected (or modified time if the key is 
fully defined by the trigger) and put that in the TriggerEvent payload. Then 
only one DAG would run per time the condition became true, rather than 
scheduling infinitely.
   
   ### Use case/motivation
   
   I was looking to turn a DAG into one with event-driven scheduling and the 
[notice on infinite 
scheduling](https://airflow.apache.org/docs/apache-airflow/stable/authoring-and-scheduling/event-scheduling.html#avoid-infinite-scheduling)
 gave me real pause. It seems like the trigger itself needs to consume the 
message, whereas my expectation is that most data engineers would find it more 
natural that the trigger would detect the condition and the running of the DAG 
would cause the condition to stop being true. E.g. trigger detects a file in a 
location and the DAG does something with the file then deletes it. But instead 
what happens is that the [trigger has to delete the 
file](https://github.com/apache/airflow/blob/main/providers/standard/src/airflow/providers/standard/triggers/file.py#L127).
 
   
   ### Related issues
   
   #49857 would also be fixed by this. Let files pile up in a location with the 
DAG paused, turn the DAG back on, and suddenly you get an event per file.
   
   ### Are you willing to submit a PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [x] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to