potiuk commented on issue #22006: URL: https://github.com/apache/airflow/issues/22006#issuecomment-2501021030
> This is not true unless some strong assumptions that you are making: Well. Not really. Airflow already has the ability to run different DagRuns at the same time, so there is absolutly no way by default that one DagRun of the same run should impact another DagRun. They can be run sequentially or in parallel and other than explicitly setting `max_active_run =1` and few other task parameters, you have zero control on whether they are executed in parallel or not. The fact that Operators are idempotent (and yes I did not mix it with independent) - is because this is how Airflow has been designed initially. From https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/index.html: > An operator represents a single, ideally idempotent, task. Operators determine what actually executes when your DAG runs. And yes - while some operators are not idempotent, lack of idempotency breaks much of the functionality of Airflow that it was designed for (Re-runs, clearing, backfills and so on). And pretty much breaks "various DAG runs for the same DAG can be run in parallel". > Remember that Airflow is a very flexible tool that is agnostic about what kind of tasks it runs and can be used for a huge variety of workflows. There is absolutely no reason to make such strong assumptions as tasks are idempotent, independent, backwards compatible when they are updated, etc. Of course it is not needed in a number of cses. And also I did not absolutely tell about being backwards compatible (this is your addition). With the proposed architecture of Airflow 3 and DAG versioning, the assumption of backwards compatibility here is precisely that is going away. While currently you absolutely need DAG backwards compatibility in Airflow when you evolve it (and this is why you need to pause it to upgrade DAG in non-compatible way) - this is precisely the assumption that "full" DAG versioning is going to address - you will NOT need DAG backwards compatibility between runs and this is what DAG versioning (full version of it) is going to provide. IMHO the proposed change will work nicely with justbasic assumption about DagRuns are kept. It does not even have to be idempotent in those cases to be honest, but it helps with mental model of DagRuns and schedule if the operators are idempotent. And this is by far most prolific case for which Airflow is used today and I still consider the other cases "niche" - my assesment here is still unchanged. With Airflow 3.0 we are getting a bit further indeed. For example we are removing execution_date, that was there primarily for that case and that will make DAG runs even more "independent" from schedule. And it will even increase the need for separation between the DAG runs - which means that almost by definition you should be able to run on DAG run with one version of code and another DAG run with another - without the need to pause the dags. That's my assesment after participating in those discussions and even voting (as PMC member) on the versioning AIP. Not sure how much you were involved and how much you read and understood about how versioning works, but if you think DAG versioning will not support your case (I think it should) - you should start discussion on the devlist and in the AIP suggesting the changes needed to support it - because that was primary reason why proposed and voted and are implementing DAG versioning. > A typical example of what Airflow can be used for is updating a data model in a database (e.g. by running the dbt tool). Such tasks are not idempotent, nor independent, nor backwards compatible when updating the code. Sure, but in this case in case you have DAG runs that can do it , you have to do the "max_active_run=1" otherwise they will start running in paralel and in this case, any such update to DAG including versioning should be transparent for you - because you will never have one DAGRun with old code and the other DAG run with new code. Simply the currently active DAG run will continue running with the old version (including all tasks from that DAG run that are scheduled to run for this particular DAG Run) and all tasks for the "future" DAG run (which will not be running yet because of max_active_runs=1') will use the new code. So you will effectively achieve the same as pausing - without pausing. I think you are still rooted too much in the way how Airlfow 2 works work where you have absolutely no control which version of the code will be used by which DAG run's task - in Airlfow 3 when full versioning is implemented, this is precisely that is going to change. But regardless - it does not really matter. - what matters now is that this "feature" is up for grabs for anyone. If you think you want this feature, you might absolutely propose to implement it (in Airflow 3 because there will not be a new feature request for Airflow 2). If there is no-one else among the maintiners who think it's worth implementing as Airflow 3 current efforts (most maintainers are heads down doing it) - then it needs **somoene** to roll their sleeves up and propose and contribute implementation of it (again - only in Airflow 3 - because there will not be Airlfow 2 feature release. So my opinion whether it's "niche" or "not" does not matter - as I am just one of the PMC members. What matters is whethere there is enough of a will and hands to work on this feature to implement it - also taking into account that likely (if my assesment is correct) - DAG Versioning will significantly decrease the need of having this weird pausing/unpausing scheme in the first place. So - just to repeat - if you want to focus and implement this and make a PR, you are absolutely free to do it, but IMHO this feature is largely obsoleted by what's coming in Airflow 3 so you are unlikely to find and advocate for someone who is involved in Airlflow 3 to spend their time on it. But you might if you want. And this is all what I am saying. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
