I am currently on sick leave, and still recovering - hoping to be able to travel next week to the US as planned, so I just wanted to break out of it to make one comment here.
I got a clearer head now a bit with medications hopefully working. I am still taking it that should help me to get over the current state, and I wanted to take a look at this discussion unraveling first. Over last week I disconnected from "day-to-day" Airflow and put some thoughts (as much as I could in my current state) on it. The whole subject of this thread was started from that - how the current discussions on AIP-67 and others change if we consider Airflow 3 is "starting". The price for back-compat is speed of development and quality. More combinations to test, more unexpected issues uncovered, necessity to keep parallel paths (old/new) while adding new features. All what Constance wrote about and what Ash explained. We already started to trip over our own feet mutliple times in a few last releases. Have we tested all combinations of deployment in Airflow 2.8 and 2.9 - not really, I think we already see that in a number of "combos" of features things are not working in as stable a way as they did before. Airflow 3 is a bold move. We risk users will stay on Airflow 2 for a long time (or even move out) as they will not want to move to Airflow 3. A lot of the work implemented in AIP-44 and design of AIP-67 was done around back-compatibility. but yes - it would have been way easier if designed anew without back-compatibility in mind. And if we implement it and release it in Airflow 2 it will make new Airflow feature development even harder. That's why I wanted to treat it as "tactical" solution - hoping that in Airflow 3 we can make it "properly" - and that's why I started the discussion here when I sensed that we are "close" to Airflow 3 discussion, because I wanted to see what options we have there. This is why I have not yet concluded voting on AIP-67 waiting for the result of this discussion here. But if we are ready to go for Airflow.3 then I'd say there are two important things that should be part of the vision. 1) *We should be far more opinionated and have far fewer options of running things in Airflow 3*. Even an order of magnitude more opinionated. Make choices, stick to it, perfect those opinionated choices to suit 80/20 (or even 70/30 or maybe even 60/40) rule if you will. Risking not fitting the 20% that might choose to stay at Airflow 2. We can choose now which ~20% of cases we do not want to handle deliberately. And we should be very, very strict about it. Default should be "no choice". This will radically simplify deployment and should make it easier to simplify Airflow development and DAG authoring experience because we will have less cases to support. Even if we plan to add more options in the future, the first version of Airflow 3 should support one deployment approach only. This is the only way we can deliver it fast. And we should be very bold there. Choose one option and go for it in pretty much every place we have choices now. We should Aim for Airflow 3.0 to support only a subset of current users - but those who are most likely to migrate first and those with the biggest need for the new features. We can think 3.x to support more cases, but 3.0 should be as opinionated as humanly possible. And this deployment option should be also something ALL our stakeholders will feel OK with as a way forward in their offering. My candidates (and yes, some are bold): * *Drop MySQL*. If we have a single thing that makes us avoid our schema and DB migration - this is the case. Let's choose Postgres 15+ and use some of the great features there. This will also enable much faster async SQL implementation and a number of other optimisations - not to mention cutting every single change in development and testing time by literally half. And we should not look back to adding MySQL. * *Drop Celery/Sequential Executor* and start with Local + K8S only (and AWS/Google others can continue developing theirs of course in parallel and continue Hybrid executor work). Later - we figure out a better solution to support "small" tasks using some new K8S features and possibly non-k8s solutions (Ray-based?) * *Cut Connection and Variable Management from DB/UI*. Leave only Secrets Management. Later when we have a 100% extensible React UI, we can add a "local DB secrets manager" add-on * *Choose a single way for DAG storage that will support versioning from day one*. Bear in mind we can add others later. Bolke's idea of using FSspec is an interesting one, we should see if it is feasible. * *Drop FAB completely (including custom plugins) and invest in implementing Auth Manager based on a dedicated, external solution* (KeyCloak that we've discussed before as a likely candidate) * *Leave Providers with Airflow 2 and add tests to make sure they are Airflow 3 future-compatible *- develop a way where we continue development and contributions for Providers with Airflow 2 and add complete tests to run them with Airflow 3. This way we can continue developing Provider features independently, and make them work for Airflow 2 (and continue adding features for Airflow 2 users alongside Airflow 2 bugfixes), while also gradually fix any Airflow3 incompatibilities and instead of "back-compatibility" tests make provider "forward-compatibility" tests so that future Providers are tested and work on Airflow 3. Also it will make it easiest to continue Airflow 2 (bugfixes) + Providers tested without investing in changing the current CI / test harness. * *Simplify Test Harness for Airflow 3 from the start *- without providers and 790+ dependencies, we could vastly simplify Airflow3 testing (basically make CI jobs from scratch) using mostly standard Python tooling (while we can continue making use of the current test harness for Airflow 2 + Providers and extend it with Airflow 3 future-compatibility tests). That means Breeze would be only staying in Airflow 2 + Providers repo as we should be able to achieve most of what we have there with local venv/ tooling (especially with uv as underlying tooling). 2) *I think we only add very few new "important" features. *Absolute minimum to make Airflow 3 appealing and add them only in Airflow 3: versioning, multi-team, pluggable UI should only be Airflow 3 - it makes no sense to invest into Airflow 2 if we already know Airflow 3 is coming - that generally triples effort needed to get them out. We should drop new features development in Airflow 2. This will give users incentive to move to 3 if the new features will be worth it. Even paying compatibility/migration price. Versionig, for example: I believe if we decide to go only with Airflow 3 and cut some of the above (Postgres only, Single versioning DAG storage) we can make bolder decisions in versioning and support simpler models from the get go (and deliver it faster). And we should add only a few - but important - features that our users clearly asked for and focus on delivering Airflow 3 as soon as possible (instead of Airflow 2.10 or 2.11). Similarly - multi-team can be simplified if we cut things from the list above and have Task isolation as first-class citizens in Airflow (and the only option). My candidates very much concur with the list shared by Kaxil in the doc + I'd add multi-team (but simplified thanks to the cuts). But I also here would mostly revert to Astronomer, Google. AWS team to define collectively what is the absolute minimum set of features that would get the "target" part of their customers happy. And ONLY do that. So in short - I think the big part of our discussion should be what we are ready to drop when we start airflow 3 and be very bold. Once we know we should figure out the absolute minimum of things that we can add that will benefit a significant part of our users (and make use of increased speed because we dropped things). J. On Mon, May 6, 2024 at 8:40 PM Constance Martineau <consta...@astronomer.io.invalid> wrote: > Hi Michal, > > Thanks for your thoughts on the Airflow 3 proposal. I appreciate your > concerns about the migration overhead for our users with a major new > version and see the appeal in your suggestion to integrate many of the > proposed changes into Airflow 2 through separate AIPs. It’s a valid point > and certainly aligns with the value of making incremental improvements. > > However, after looking closely at the enhancements outlined for Airflow 3, > I'm convinced they warrant a new major release. Here’s why: > > 1. *Core Architectural Changes:* We’re looking at foundational changes > with Airflow 3—like redefining task priorities, separating task > definition > and task execution, and new AIPs like DAG versioning. remote execution > and restricting database access from workers. These aren’t just > incremental > improvements but major shifts that will set the stage for the next > decade > of Airflow’s architecture. Grouping these changes into a major release > will > help us make these transitions more cleanly and with fewer constraints > from > past decisions. > 2. *Code Clean-Up*: Our main branch has accumulated over 140 deprecated > issues, and this will only grow if we continue without a major cleanup. > This makes it increasingly difficult to implement new features > effectively > while maintaining backward compatibility. A major release allows us to > address these issues head-on, reducing technical debt and paving the way > for a more robust platform. > 3. *Managing Breaking Changes:* Let’s take the example of restricting > database access from workers. It’s a necessary move for better security > and > also potentially scalability reasons (reduces DB load). Many users have > workflows that interact with the DB, either by using raw sql or by > leveraging a session object. We could implement this feature in Airflow > 2 > and avoid breaking existing workflows by continuing to have the old > standard mode as default - much of the work is already done - but that > would mean supporting both the new secure mode and the old standard mode > indefinitely and design new features with the assumption that most will > continue using the old standard mode. With Airflow 3, we can make secure > mode the default or even the only option, simplifying implementation and > future development. This is just one example where it is feasible to > implement in Airflow 2, but is better if we release it under the > context of > Airflow 3. > 4. *Future-Proofing for New Features:* Airflow 3 will open up > possibilities for handling workflows beyond batch processing. Features > like > real-time DAG execution through API and multi-language task support are > big > steps forward, significantly expanding Airflow’s utility. > > > While integrating these updates into Airflow 2 might look less disruptive > initially, the scale and nature of the required changes really support a > move to Airflow 3. It’s not just about adding new features; it’s about > setting up Airflow so that it continues to remain relevant for the next ten > years. > > Constance > > On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor <a...@apache.org> wrote: > > > There's a lot of technical debt hiding in Airflow, especially the > > scheduler that makes it harder and harder to efficiently add new > features. > > > > At some point, very soon, we are going to have to remove some very > > infrequently used back compat shims that negatively affect performance. > > Without doing that the pace at which we can realistically add some of the > > more exciting features tends towards zero. Developer speed of > contributors > > is a factor here too! > > > > So while we are still using SemVer, that necessitates v3. > > > > Ash > > > > On 6 May 2024 15:30:49 BST, "Michał Modras" <michalmod...@google.com > .INVALID> > > wrote: > > >+1 to Jens's & Bolke's points here and in the doc > > > > > >I agree we should work on clarifying the directions we would like > Airflow > > >to go. Introducing a new major Airflow version is a massive overhead for > > >users, who would need to plan for migrations, onboarding the new Airflow > > >(with a slightly different architecture), etc., and effectively Airflow > 2 > > >would live in parallel for a long time. > > > > > >Personally, I think most of the points in Kaxil's/Vikram's doc are > > valuable > > >projects of their own, and I could imagine all of them being delivered > as > > >separate AIPs within Airflow 2 (surely new minor versions of Airflow > 2). I > > >am not sure if the scope of changes and the goal we want to achieve is > a) > > >clear enough b) broad enough to call for a new major version. > > > > > >Best, > > >Michal > > > > > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T) > > ><jens.scheff...@de.bosch.com.invalid> wrote: > > > > > >> Thanks for the document write-up, Kaxil. I assume this is mostly a > > vision > > >> statement. > > >> > > >> Looking forward for a larger addendum where we can collect things that > > we > > >> all can vote and agree on as targets. > > >> > > >> As I started earlier with a confluence page and it seems this is not > > >> accessible to all, shall we convert this to a Google Doc for better > > >> collaboration and item collection? > > >> > > >> Sent from Outlook for iOS<https://aka.ms/o0ukef> > > >> ________________________________ > > >> From: Vikram Koka <vik...@astronomer.io.INVALID> > > >> Sent: Sunday, May 5, 2024 3:34:33 AM > > >> To: dev@airflow.apache.org <dev@airflow.apache.org> > > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs > > >> strategic (Airflow 3) approach > > >> > > >> Thank you for your feedback, Bolke and Andrey! > > >> > > >> Bolke, > > >> I have replied to some of your comments in the doc. > > >> I will provide a detailed write up on the "Interactive DAG run" (or > > >> synchronous DAG run) capability, which has generated some early > > questions. > > >> I had intended to get an AIP published for that as a follow-up, but I > > >> believe that a simpler write up would be useful ahead of the AIP. > > >> > > >> Andrey, > > >> You raise an interesting point. > > >> > > >> As part of the Airflow 2.0 release, we as a community had decided to > > >> strictly adhere to Semver as detailed in the document you referenced. > We > > >> also consciously split out the "Core Airflow" releases from the > > "Provider" > > >> releases at that time. We had a clear expectation then for the cadence > > of > > >> both minor and patch releases, which we have generally adhered to > since > > >> then. > > >> > > >> Personally, I am more concerned about our Provider releases right now, > > as > > >> compared to the cadence of our major releases. I believe that one of > the > > >> proposed changes in the Airflow 3 document i.e. the clear separation > for > > >> Task Execution will help here, but more may be needed. > > >> > > >> Definitely interested in more feedback on this as well. > > >> > > >> Vikram > > >> > > >> > > >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin < > andrey.ans...@taragol.is > > > > > >> wrote: > > >> > > >> > I would like to propose to change (at least discuss) release policy > > >> around > > >> > the Major version of Airflow. > > >> > > > >> > Right now it is described as "These releases do not happen with any > > >> regular > > >> > interval or on any predictable schedule." : > > >> > > > >> > > > >> > > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D&reserved=0 > > >> < > > >> > > > https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release > > >> > > > >> > > > >> > So maybe it is time to make it schedulable, e.g. one per two years > or > > so. > > >> > This one could help us to avoid such a discussion in the future, > like > > "We > > >> > don't know when Airflow 4 is coming.". At the moment when the new > > major > > >> > version will be released new features wouldn't be added in the old > > major > > >> > version, however we would support bug / security for a while, e.g. 1 > > year > > >> > for bug fixes, 3 years for security fixes with a total 5 year > > lifecycle > > >> per > > >> > a major version. These just are approximate time periods for a > > definition > > >> > of current period, bugfix period and security fix period. > > >> > > > >> > In contributors' perspective it helps with dropping the deprecated > > stuff > > >> > which resolves some old problem: we have to support everything > > including > > >> > deprecated stuff and without schedulable lifecycle for the > deprecated > > >> stuff > > >> > it could be showstopper for the new feature, because sometimes it > > hard to > > >> > support two different approaches for long period of time with no > hope > > >> that > > >> > it will happen soon. For some fundamental stuff which do not > require a > > >> lot > > >> > things time to support we could postponed removal for next after the > > next > > >> > release, e.g. deprecate in Airflow 3, but remove it in Airflow 5 > > >> > > > >> > In the user perspective, they have at least bug fix support for a > > while, > > >> if > > >> > someone want to use legacy version it their choice, however no new > > >> > features, no new version of providers (after one year) > > >> > > > >> > > > >> > ---- > > >> > Best Wishes > > >> > *Andrey Anshin* > > >> > > > >> > > > >> > > > >> > On Sat, 4 May 2024 at 19:17, Bolke de Bruin <bdbr...@gmail.com> > > wrote: > > >> > > > >> > > I have left several comments :-). And on interactive dag runs even > > >> after > > >> > > the explanation of Vikram I still don't have a clue what we want > to > > >> > > accomplish there :-P. > > >> > > > > >> > > I would like to see a mantra or team for Airflow 3. That helps > > nudging > > >> > > people in the same direction. Suggestions in the comments. > > >> > > > > >> > > Bolke > > >> > > Sent from my iPhone > > >> > > > > >> > > > On 4 May 2024, at 01:14, Vikram Koka > <vik...@astronomer.io.invalid > > > > > >> > > wrote: > > >> > > > > > >> > > > Good point Jed. > > >> > > > I responded back to your comment in the doc as well and very > open > > to > > >> > > > changing the term in the doc. > > >> > > > > > >> > > > Used the term "interactive DAG run" as the ability to invoke or > > >> > trigger a > > >> > > > DAG run through the API, with the expectation of getting back a > > >> result > > >> > > > immediately. An alternate term could be a "synchronous DAG run". > > >> > > > > > >> > > > Regardless, this is a significant change so a good term to > > indicate > > >> the > > >> > > > expansion from "batch runs only" is warranted. Very open to > > different > > >> > > terms > > >> > > > here. > > >> > > > > > >> > > >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham < > > >> > jedcunning...@apache.org > > >> > > > > > >> > > >> wrote: > > >> > > >> > > >> > > >> Very exciting! Looks like we will have a busy period of time > > ahead > > >> of > > >> > > us. > > >> > > >> Overall I like the plan so far, especially using this year's > > Airflow > > >> > > Summit > > >> > > >> as an opportunity to announce and gather feedback, and the 2025 > > >> > version > > >> > > to > > >> > > >> pitch upgrading. > > >> > > >> > > >> > > >> I left a comment in the doc, but we might want to iterate on > the > > >> > > >> terminology we use for high priority or "synchronous" DAG runs > to > > >> > serve > > >> > > LLM > > >> > > >> responses - I find "interactive DAG runs" a bit confusing. > > >> > > >> > > >> > > > > >> > > > > --------------------------------------------------------------------- > > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org > > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org > > >> > > > > >> > > > > >> > > > >> > > >