I am currently on sick leave, and still recovering - hoping to be able to
travel next week to the US as planned, so I just wanted to break out of it
to make one comment here.

I got a clearer head now a bit with medications hopefully working. I am
still taking it that should help me to get over the current state, and I
wanted to take a look at this discussion  unraveling first. Over last week
I disconnected from "day-to-day" Airflow and put some thoughts (as much as
I could in my current state) on it. The whole subject of this thread was
started from that - how the current discussions on AIP-67 and others change
if we consider Airflow 3 is "starting".

The price for back-compat is speed of development and quality. More
combinations to test, more unexpected issues uncovered, necessity to keep
parallel paths (old/new) while adding new features. All what Constance
wrote about and what Ash explained. We already started to trip over our own
feet mutliple times in a few last releases. Have we tested all combinations
of deployment in Airflow 2.8 and 2.9 - not really, I think we already see
that in a number of "combos" of features things are not working in as
stable a way as they did before.

Airflow 3 is a bold move. We risk users will stay on Airflow 2 for a long
time (or even move out) as they will not want to move to Airflow 3. A lot
of the work implemented in AIP-44 and design of AIP-67 was done around
back-compatibility. but yes -
it would have been way easier if designed anew without back-compatibility
in mind. And if we implement it and release it in Airflow 2 it will make
new Airflow feature development even harder. That's why I wanted to treat
it as "tactical" solution - hoping that in Airflow 3 we can make it
"properly" - and that's why I started the discussion here when I sensed
that we are "close" to Airflow 3 discussion, because I wanted to see what
options we have there. This is why I have not yet concluded voting on
AIP-67 waiting for the result of this discussion here.

But if we are ready to go for Airflow.3 then I'd say there are two
important things that should be part of the vision.

1)  *We should be far more opinionated and have far fewer options of
running things in Airflow 3*. Even an order of magnitude more opinionated.
Make choices, stick to it, perfect those opinionated choices to suit 80/20
(or even 70/30 or maybe even 60/40) rule if you will. Risking not fitting
the 20% that might choose to stay at Airflow 2. We can choose now which
~20% of cases we do not want to handle deliberately. And we should be very,
very strict about it. Default should be "no choice". This will radically
simplify deployment and should make it easier to simplify Airflow
development and DAG authoring experience because we will have less cases to
support. Even if we plan to add more options in the future, the first
version of Airflow 3 should support one deployment approach only. This is
the only way we can deliver it fast. And we should be very bold there.
Choose one option and go for it in pretty much every place we have choices
now. We should Aim for Airflow 3.0 to support only a subset of current
users - but those who are most likely to migrate first and those with the
biggest need for the new features. We can think 3.x to support more cases,
but 3.0 should be as opinionated as humanly possible.

And this deployment option should be also something ALL our stakeholders
will feel OK with as a way forward in their offering.

My candidates (and yes, some are bold):

* *Drop MySQL*. If we have a single thing that makes us avoid our schema
and DB migration - this is the case. Let's choose Postgres 15+ and use some
of the great features there. This will also enable much faster async SQL
implementation and a number of other optimisations - not to mention cutting
every single change in development and testing time by literally half. And
we should not look back to adding MySQL.
* *Drop Celery/Sequential Executor* and start with Local + K8S only (and
AWS/Google others can continue developing theirs of course in parallel and
continue Hybrid executor work). Later - we figure out a better solution to
support "small" tasks using some new K8S features and possibly non-k8s
solutions (Ray-based?)
* *Cut Connection and Variable Management from DB/UI*. Leave only Secrets
Management. Later when we have a 100% extensible React UI, we can add a
"local DB secrets manager" add-on
* *Choose a single way for DAG storage that will support versioning from
day one*. Bear in mind we can add others later. Bolke's idea of using
FSspec is an interesting one, we should see if it is feasible.
* *Drop FAB completely (including custom plugins) and invest in
implementing Auth Manager based on a dedicated, external solution* (KeyCloak
that we've discussed before as a likely candidate)
* *Leave Providers with Airflow 2 and add tests to make sure they are
Airflow 3 future-compatible *- develop a way where we continue development
and contributions for Providers with Airflow 2 and add complete tests to
run them with Airflow 3. This way we can continue developing Provider
features independently, and make them work for Airflow 2 (and continue
adding features for Airflow 2 users alongside Airflow 2 bugfixes), while
also gradually fix any Airflow3 incompatibilities and instead of
"back-compatibility" tests make provider "forward-compatibility" tests so
that future Providers are tested and work on Airflow 3. Also it will make
it easiest to continue Airflow 2 (bugfixes) + Providers tested without
investing in changing the current CI / test harness.
* *Simplify Test Harness for Airflow 3 from the start *- without providers
and 790+ dependencies, we could vastly simplify Airflow3 testing (basically
make CI jobs from scratch) using mostly standard Python tooling (while we
can continue making use of the current test harness for Airflow 2 +
Providers and extend it with Airflow 3 future-compatibility tests). That
means Breeze would be only staying in Airflow 2 + Providers repo as we
should be able to achieve most of what we have there with local venv/
tooling (especially with uv as underlying tooling).

2) *I think we only add very few new "important" features. *Absolute
minimum to make Airflow 3 appealing and add them only in Airflow 3:
versioning, multi-team, pluggable UI should only be Airflow 3 - it makes no
sense to invest into Airflow 2 if we already know Airflow 3 is coming -
that generally triples effort needed to get them out. We should drop new
features development in Airflow 2. This will give users incentive to move
to 3 if the new features will be worth it. Even paying
compatibility/migration price.

Versionig, for example: I believe if we decide to go only with Airflow 3
and cut some of the above (Postgres only, Single versioning DAG storage) we
can make bolder decisions in versioning and support simpler models from the
get go (and deliver it faster). And we should add only a few - but
important - features that our users clearly asked for and focus on
delivering Airflow 3 as soon as possible (instead of Airflow 2.10 or 2.11).
Similarly - multi-team can be simplified if we cut things from the list
above and have Task isolation as first-class citizens in Airflow (and the
only option).

My candidates very much concur with the list shared by Kaxil in the doc +
I'd add multi-team (but simplified thanks to the cuts). But I also here
would mostly revert to Astronomer, Google. AWS team to define collectively
what is the absolute minimum set of features that would get the "target"
part of their customers happy. And ONLY do that.

So in short - I think the big part of our discussion should be what we are
ready to drop when we start airflow 3 and be very bold. Once we know we
should figure out the absolute minimum of things that we can add that will
benefit a significant part of our users (and make use of increased speed
because we dropped things).

J.


On Mon, May 6, 2024 at 8:40 PM Constance Martineau
<consta...@astronomer.io.invalid> wrote:

> Hi Michal,
>
> Thanks for your thoughts on the Airflow 3 proposal. I appreciate your
> concerns about the migration overhead for our users with a major new
> version and see the appeal in your suggestion to integrate many of the
> proposed changes into Airflow 2 through separate AIPs. It’s a valid point
> and certainly aligns with the value of making incremental improvements.
>
> However, after looking closely at the enhancements outlined for Airflow 3,
> I'm convinced they warrant a new major release. Here’s why:
>
>    1. *Core Architectural Changes:* We’re looking at foundational changes
>    with Airflow 3—like redefining task priorities, separating task
> definition
>    and task execution, and new AIPs like DAG versioning. remote execution
>    and restricting database access from workers. These aren’t just
> incremental
>    improvements but major shifts that will set the stage for the next
> decade
>    of Airflow’s architecture. Grouping these changes into a major release
> will
>    help us make these transitions more cleanly and with fewer constraints
> from
>    past decisions.
>    2. *Code Clean-Up*: Our main branch has accumulated over 140 deprecated
>    issues, and this will only grow if we continue without a major cleanup.
>    This makes it increasingly difficult to implement new features
> effectively
>    while maintaining backward compatibility. A major release allows us to
>    address these issues head-on, reducing technical debt and paving the way
>    for a more robust platform.
>    3. *Managing Breaking Changes:* Let’s take the example of restricting
>    database access from workers. It’s a necessary move for better security
> and
>    also potentially scalability reasons (reduces DB load). Many users have
>    workflows that interact with the DB, either by using raw sql or by
>    leveraging a session object. We could implement this feature in Airflow
> 2
>    and avoid breaking existing workflows by continuing to have the old
>    standard mode as default - much of the work is already done - but that
>    would mean supporting both the new secure mode and the old standard mode
>    indefinitely and design new features with the assumption that most will
>    continue using the old standard mode. With Airflow 3, we can make secure
>    mode the default or even the only option, simplifying implementation and
>    future development. This is just one example where it is feasible to
>    implement in Airflow 2, but is better if we release it under the
> context of
>    Airflow 3.
>    4. *Future-Proofing for New Features:* Airflow 3 will open up
>    possibilities for handling workflows beyond batch processing. Features
> like
>    real-time DAG execution through API and multi-language task support are
> big
>    steps forward, significantly expanding Airflow’s utility.
>
>
> While integrating these updates into Airflow 2 might look less disruptive
> initially, the scale and nature of the required changes really support a
> move to Airflow 3. It’s not just about adding new features; it’s about
> setting up Airflow so that it continues to remain relevant for the next ten
> years.
>
> Constance
>
> On Mon, May 6, 2024 at 2:10 PM Ash Berlin-Taylor <a...@apache.org> wrote:
>
> > There's a lot of technical debt hiding in Airflow, especially the
> > scheduler that makes it harder and harder to efficiently add new
> features.
> >
> > At some point, very soon, we are going to have to remove some very
> > infrequently used back compat shims that negatively affect performance.
> > Without doing that the pace at which we can realistically add some of the
> > more exciting features tends towards zero. Developer speed of
> contributors
> > is a factor here too!
> >
> > So while we are still using SemVer, that necessitates v3.
> >
> > Ash
> >
> > On 6 May 2024 15:30:49 BST, "Michał Modras" <michalmod...@google.com
> .INVALID>
> > wrote:
> > >+1 to Jens's & Bolke's points here and in the doc
> > >
> > >I agree we should work on clarifying the directions we would like
> Airflow
> > >to go. Introducing a new major Airflow version is a massive overhead for
> > >users, who would need to plan for migrations, onboarding the new Airflow
> > >(with a slightly different architecture), etc., and effectively Airflow
> 2
> > >would live in parallel for a long time.
> > >
> > >Personally, I think most of the points in Kaxil's/Vikram's doc are
> > valuable
> > >projects of their own, and I could imagine all of them being delivered
> as
> > >separate AIPs within Airflow 2 (surely new minor versions of Airflow
> 2). I
> > >am not sure if the scope of changes and the goal we want to achieve is
> a)
> > >clear enough b) broad enough to call for a new major version.
> > >
> > >Best,
> > >Michal
> > >
> > >On Sun, May 5, 2024 at 10:10 AM Scheffler Jens (XC-AS/EAE-ADA-T)
> > ><jens.scheff...@de.bosch.com.invalid> wrote:
> > >
> > >> Thanks for the document write-up, Kaxil. I assume this is mostly a
> > vision
> > >> statement.
> > >>
> > >> Looking forward for a larger addendum where we can collect things that
> > we
> > >> all can vote and agree on as targets.
> > >>
> > >> As I started earlier with a confluence page and it seems this is not
> > >> accessible to all, shall we convert this to a Google Doc for better
> > >> collaboration and item collection?
> > >>
> > >> Sent from Outlook for iOS<https://aka.ms/o0ukef>
> > >> ________________________________
> > >> From: Vikram Koka <vik...@astronomer.io.INVALID>
> > >> Sent: Sunday, May 5, 2024 3:34:33 AM
> > >> To: dev@airflow.apache.org <dev@airflow.apache.org>
> > >> Subject: Re: [HUGE DISCUSSION] Airflow3 and tactical (Airflow 2) vs
> > >> strategic (Airflow 3) approach
> > >>
> > >> Thank you for your feedback, Bolke and Andrey!
> > >>
> > >> Bolke,
> > >> I have replied to some of your comments in the doc.
> > >> I will provide a detailed write up on the "Interactive DAG run" (or
> > >> synchronous DAG run) capability, which has generated some early
> > questions.
> > >> I had intended to get an AIP published for that as a follow-up, but I
> > >> believe that a simpler write up would be useful ahead of the AIP.
> > >>
> > >> Andrey,
> > >> You raise an interesting point.
> > >>
> > >> As part of the Airflow 2.0 release, we as a community had decided to
> > >> strictly adhere to Semver as detailed in the document you referenced.
> We
> > >> also consciously split out the "Core Airflow" releases from the
> > "Provider"
> > >> releases at that time. We had a clear expectation then for the cadence
> > of
> > >> both minor and patch releases, which we have generally adhered to
> since
> > >> then.
> > >>
> > >> Personally, I am more concerned about our Provider releases right now,
> > as
> > >> compared to the cadence of our major releases. I believe that one of
> the
> > >> proposed changes in the Airflow 3 document i.e. the clear separation
> for
> > >> Task Execution will help here, but more may be needed.
> > >>
> > >> Definitely interested in more feedback on this as well.
> > >>
> > >> Vikram
> > >>
> > >>
> > >> On Sat, May 4, 2024 at 10:57 AM Andrey Anshin <
> andrey.ans...@taragol.is
> > >
> > >> wrote:
> > >>
> > >> > I would like to propose to change (at least discuss) release policy
> > >> around
> > >> > the Major version of Airflow.
> > >> >
> > >> > Right now it is described as "These releases do not happen with any
> > >> regular
> > >> > interval or on any predictable schedule." :
> > >> >
> > >> >
> > >>
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fairflow.apache.org%2Fdocs%2Fapache-airflow%2Fstable%2Frelease-process.html%23term-Major-release&data=05%7C02%7CJens.Scheffler%40de.bosch.com%7C789cc98bb82b41e6080208dc6ca3a6ef%7C0ae51e1907c84e4bbb6d648ee58410f4%7C0%7C0%7C638504697343083297%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C&sdata=1OdyNadtakyhq4%2FQiDu1ooNaP7YOfuc7UtpU6sltPLQ%3D&reserved=0
> > >> <
> > >>
> >
> https://airflow.apache.org/docs/apache-airflow/stable/release-process.html#term-Major-release
> > >> >
> > >> >
> > >> > So maybe it is time to make it schedulable, e.g. one per two years
> or
> > so.
> > >> > This one could help us to avoid such a discussion in the future,
> like
> > "We
> > >> > don't know when Airflow 4 is coming.". At the moment when the new
> > major
> > >> > version will be released new features wouldn't be added in the old
> > major
> > >> > version, however we would support bug / security for a while, e.g. 1
> > year
> > >> > for bug fixes, 3 years for security fixes with a total 5 year
> > lifecycle
> > >> per
> > >> > a major version. These just are approximate time periods for a
> > definition
> > >> > of current period, bugfix period and security fix period.
> > >> >
> > >> > In contributors' perspective it helps with dropping the deprecated
> > stuff
> > >> > which resolves some old problem: we have to support everything
> > including
> > >> > deprecated stuff and without schedulable lifecycle for the
> deprecated
> > >> stuff
> > >> > it could be showstopper for the new feature, because sometimes it
> > hard to
> > >> > support two different approaches for long period of time with no
> hope
> > >> that
> > >> > it will happen soon. For some fundamental stuff which do not
> require a
> > >> lot
> > >> > things time to support we could postponed removal for next after the
> > next
> > >> > release, e.g. deprecate in Airflow 3, but remove it in Airflow 5
> > >> >
> > >> > In the user perspective, they have at least bug fix support for a
> > while,
> > >> if
> > >> > someone want to use legacy version it their choice, however no new
> > >> > features, no new version of providers (after one year)
> > >> >
> > >> >
> > >> > ----
> > >> > Best Wishes
> > >> > *Andrey Anshin*
> > >> >
> > >> >
> > >> >
> > >> > On Sat, 4 May 2024 at 19:17, Bolke de Bruin <bdbr...@gmail.com>
> > wrote:
> > >> >
> > >> > > I have left several comments :-). And on interactive dag runs even
> > >> after
> > >> > > the explanation of Vikram I still don't have a clue what we want
> to
> > >> > > accomplish there :-P.
> > >> > >
> > >> > > I would like to see a mantra or team for Airflow 3. That helps
> > nudging
> > >> > > people in the same direction. Suggestions in the comments.
> > >> > >
> > >> > > Bolke
> > >> > > Sent from my iPhone
> > >> > >
> > >> > > > On 4 May 2024, at 01:14, Vikram Koka
> <vik...@astronomer.io.invalid
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > Good point Jed.
> > >> > > > I responded back to your comment in the doc as well and very
> open
> > to
> > >> > > > changing the term in the doc.
> > >> > > >
> > >> > > > Used the term "interactive DAG run" as the ability to invoke or
> > >> > trigger a
> > >> > > > DAG run through the API, with the expectation of getting back a
> > >> result
> > >> > > > immediately. An alternate term could be a "synchronous DAG run".
> > >> > > >
> > >> > > > Regardless, this is a significant change so a good term to
> > indicate
> > >> the
> > >> > > > expansion from "batch runs only" is warranted. Very open to
> > different
> > >> > > terms
> > >> > > > here.
> > >> > > >
> > >> > > >> On Fri, May 3, 2024 at 4:05 PM Jed Cunningham <
> > >> > jedcunning...@apache.org
> > >> > > >
> > >> > > >> wrote:
> > >> > > >>
> > >> > > >> Very exciting! Looks like we will have a busy period of time
> > ahead
> > >> of
> > >> > > us.
> > >> > > >> Overall I like the plan so far, especially using this year's
> > Airflow
> > >> > > Summit
> > >> > > >> as an opportunity to announce and gather feedback, and the 2025
> > >> > version
> > >> > > to
> > >> > > >> pitch upgrading.
> > >> > > >>
> > >> > > >> I left a comment in the doc, but we might want to iterate on
> the
> > >> > > >> terminology we use for high priority or "synchronous" DAG runs
> to
> > >> > serve
> > >> > > LLM
> > >> > > >> responses - I find "interactive DAG runs" a bit confusing.
> > >> > > >>
> > >> > >
> > >> > >
> > ---------------------------------------------------------------------
> > >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> > >> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> > >> > >
> > >> > >
> > >> >
> > >>
> >
>

Reply via email to