seub commented on issue #22006:
URL: https://github.com/apache/airflow/issues/22006#issuecomment-2498607379

   > There is no need to pause anything - it is going to just work
   
   This is not true unless some strong assumptions that you are making:
   
   > Airlfow's tasks are idempotent
   
   That's typically not true. Maybe your task is to add a row to a table in a 
database. If you run it twice, you'll have two rows. (Also I'm not sure you 
know what idempotent means, it's not the most relevant here)
   
   > Theoretically (and practically) data in one DAG Run should not interfere 
with Data in another DAG Run - and they could be run in parallel
   
   There is absolutely no reason to assume that. Same example as above.
   
   Remember that Airflow is a job scheduler that can be used for a huge variety 
of workflows. There is absolutely no reason to assume such strong assumptions 
as tasks are systematically idempotent, independent, backwards compatible when 
they are updated, etc. (and it's just ridiculous to call "unnatural" or 
"strange" any workflows that don't verify these assumptions.)
   
   Maybe you should study examples of what Airflow customers use it for. A 
typical example is updating data models in a data warehouse (e.g. by running 
the dbt tool). Such tasks are not idempotent, idependent, and backwards 
compatible when changing the data model.
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to