seub commented on issue #22006: URL: https://github.com/apache/airflow/issues/22006#issuecomment-2498607379
> There is no need to pause anything - it is going to just work This is not true unless some strong assumptions that you are making: > Airlfow's tasks are idempotent That's typically not true. Maybe your task is to add a row to a table in a database. If you run it twice, you'll have two rows. (Also I'm not sure you know what idempotent means, it's not the most relevant here) > Theoretically (and practically) data in one DAG Run should not interfere with Data in another DAG Run - and they could be run in parallel There is absolutely no reason to assume that. Same example as above. Remember that Airflow is a job scheduler that can be used for a huge variety of workflows. There is absolutely no reason to assume such strong assumptions as tasks are systematically idempotent, independent, backwards compatible when they are updated, etc. (and it's just ridiculous to call "unnatural" or "strange" any workflows that don't verify these assumptions.) Maybe you should study examples of what Airflow customers use it for. A typical example is updating data models in a data warehouse (e.g. by running the dbt tool). Such tasks are not idempotent, idependent, and backwards compatible when changing the data model. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
