Re: [DISCUSS]: Proposal to focus on polishing Airflow 3

Bishundeo, Rajeshwar Tue, 14 Oct 2025 08:06:25 -0700

+1 on focusing on stability for AF3, great proposal Vikram! 

Additionally, in line with Jens comments wrt:
>(Assuming that the "partial" features like deadline alerts and things that are 
>marked as experimental of course continue to complete)


I think we should continue working on these items and I would like to include 
Multi-team in that list (also experimental) as these were all frequently 
discussed topics at the AF summit last week.

-- Rajesh 






On 2025-10-12, 11:24 PM, "Dheeraj Turaga" <[email protected] 
<mailto:[email protected]>> wrote:


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.






AVERTISSEMENT: Ce courrier électronique provient d’un expéditeur externe. Ne 
cliquez sur aucun lien et n’ouvrez aucune pièce jointe si vous ne pouvez pas 
confirmer l’identité de l’expéditeur et si vous n’êtes pas certain que le 
contenu ne présente aucun risque.






As promised during the Airflow Summit, I’m sharing our experience migrating
from Airflow 2.10 to 3.1 and overall, we’re thrilled with the upgrade.
EdgeExecutor and Triggering User Name were the main drivers for us—they’ve
solved big pain points in our workflows. Thank you for making these
features available!


How We Migrated


We spun up a new Airflow 3 instance alongside Airflow 2, cleaned up DAG
parsing issues, ran UAT, and then retired Airflow 2 during a maintenance
window.


Why We Upgraded


- EdgeExecutor: Critical for running tasks across multiple data centers.
- Triggering User Name: Huge for us—most runs are manual, and knowing
who triggered a DAG is essential. In Airflow 2, we had to duplicate DAGs
per user, which was messy and resource-heavy.


Migration Challenges


- Almost all DAGs needed updates. Import changes were easy, but I wasn’t
aware of Ruff helper scripts at the time and migrated manually.
- Lack of user-facing DAG debug tools was tough. Commands like airflow
dags list and list-import-errors now rely on the DB, so users can’t easily
validate DAGs without running a processor or standalone mode. We built a
custom CI script using DagBag, but it broke during 3.0.6–3.1 due to class
changes.
- No way to estimate DAG parse time without running the dag
processor—large dynamic DAGs (1000+ tasks) caused bottlenecks.


Issues Observed


- Retry-delay bug: Scheduler crashed during 3.0.6 → 3.1 migration;
required manual patch.
- Grid View: Very slow for large DAGs (2000+ tasks); improved in 3.1 but
still slower than 2.10.
- Mobile UI: DAG page, logs, clear/try actions broken in Airflow 3;
critical for global teams working odd hours.
- Task Ordering: Broken in 3.1; I have an open PR to fix this.
- Graph View: Unusable for large DAGs with task groups; tasks scattered
infinitely (fixed in 3.0.6).
- Rendered Template: Formatting issues in 3.0; fixed in 3.1.
- Logs:
- Scrolling large logs is slow; 3.1 added “jump to end” but still
sluggish.
- Minified React issue on overview page with large logs (fixed in
3.0.6).
- Missing option to remove “source” line (fixed in 3.1).


Suggestions for Improvement


- DAG Validation Tools: Provide user-facing utilities to check import
errors and estimate parse time without running a processor.
- Mobile UI: Restore functionality for DAG actions and log viewing.
- Grid View Performance: Optimize for large DAGs with thousands of tasks.
- Log Viewer: Improve scrolling speed and usability for large logs.
- Git DAG Bundles: Add robust retry logic for connectivity issues; bare
clone corruption currently breaks all tasks.
- Mapped Index & Rendered Templates: Show mapped index and rendered name
before task runs; render even if task fails.
- Search Capabilities: Add more filters for DAGRun and Task Instances
(e.g., hostname) to help diagnose host-specific failures.
- API Server Metrics: Expose metrics for API request load to help plan
horizontal scaling.






Airflow 3 is a big step forward for us—these suggestions are aimed at
making it even better for large-scale deployments. Thanks for all the hard
work!






Cheers,
Dheeraj Turaga


On Mon, Oct 6, 2025 at 3:11 PM Kaxil Naik <[email protected] 
<mailto:[email protected]>> wrote:


> That's awesome, let's catchup Dheeraj - looking forward to hearing about
> your migration experience
>
> On Mon, 6 Oct 2025 at 10:59, Dheeraj Turaga <[email protected] 
> <mailto:[email protected]>>
> wrote:
>
> > +1 I strongly agree, as someone who migrated our org to 3.1 last week
> (and
> > initial adopters for 3.0 aswell) , I have been pushing bug fixes as I see
> > and as my user base reports them. I felt the migration was bumpy and
> have
> > some notes regarding this
> >
> > Looking forward to share my experience with you all at the summit!
> >
> > On Mon, Oct 6, 2025 at 7:53 PM Pavankumar Gopidesu <
> > [email protected] <mailto:[email protected]>>
> > wrote:
> >
> > > Yeah very strong +1 lets make next release super stable version.
> > >
> > > Regards
> > > Pavan
> > >
> > > On Mon, 6 Oct 2025 at 18:36, Kaxil Naik <[email protected] 
> > > <mailto:[email protected]>> wrote:
> > >
> > > > Strong +1, thanks Vikram for the proposal.
> > > >
> > > > Dedicated time for this is essential.
> > > >
> > > > On Mon, 6 Oct 2025 at 01:22, Aritra Basu <[email protected] 
> > > > <mailto:[email protected]>>
> > > wrote:
> > > >
> > > > > Very valid points from both of you! I am all in on this as well, I
> > > think
> > > > > it's been all cylinders firing for a little while now with making
> > > > airflow 3
> > > > > feature rich. Taking some time to clean up the features would be
> > great!
> > > > >
> > > > > --
> > > > > Regards,
> > > > > Aritra Basu
> > > > >
> > > > > On Mon, 6 Oct 2025, 10:23 am Jarek Potiuk, <[email protected] 
> > > > > <mailto:[email protected]>>
> wrote:
> > > > >
> > > > > > I am very much for it!
> > > > > >
> > > > > > And I would even add a little - it would be great that pretty
> much
> > > all
> > > > of
> > > > > > us got involved - even (and especially) in the areas that they
> have
> > > not
> > > > > > been involved so far.
> > > > > >
> > > > > > There are a number of areas - both new, and "changed old" that I
> > > think
> > > > > > there is a small number of "experts" (basically those who worked
> on
> > > > it) -
> > > > > > but others have limited visibility of understanding of a) new
> areas
> > > b)
> > > > > > scope of changes (I am speaking from my own experience here as
> > well).
> > > > And
> > > > > > we are somewhat shying away not even in attempting to fix things,
> > but
> > > > > also
> > > > > > even in triaging and responding and interacting with users who
> are
> > > > > raising
> > > > > > issues. Thus many issues are untriaged. I think we got a bit more
> > > > > "siloed"
> > > > > > in our part with Airflow 3 development and we need to break the
> > > silos a
> > > > > > bit.
> > > > > >
> > > > > > There might be few reasons:
> > > > > >
> > > > > > - we feel not competent enough to help
> > > > > > - somehow we feel "the others who implemented it" are responsible
> > for
> > > > > > fixing those
> > > > > > - we have "our" parts that we are looking at and focusing on
> (this
> > > is I
> > > > > > think the biggest part especially for those "experts" who might
> > feel
> > > > > > overwhelmed - if we look elsewhere, we might have a feeling that
> > > "our"
> > > > > > part will be lagging behind)
> > > > > > - those "experts" on the other hand might feel overloaded with a
> > > number
> > > > > of
> > > > > > issues in their specific area and have hard time in getting
> someone
> > > to
> > > > > help
> > > > > > them
> > > > > >
> > > > > > I think ideally, we need more of the community engagement here -
> > and
> > > > > likely
> > > > > > "experts" taking more of a role of brainstorming and guiding
> other
> > > > > > contributors, committers, PMC members to help following their
> > advice
> > > > and
> > > > > > oversight in solving the issues. That would not only be
> opportunity
> > > to
> > > > > efix
> > > > > > things potentially faster (after initial ramp-up time) but also
> > turn
> > > > such
> > > > > > "polishing" period into a knowledge transfer. Ultimately it's not
> > one
> > > > or
> > > > > > two person who is responsible for some "areas" in Airflow, but
> > whole
> > > > > > community is. And those "experts" might even find time to help in
> > > > "other"
> > > > > > areas if they are less burdened with working on solutions down
> to a
> > > > green
> > > > > > PR in their area of expertise.
> > > > > >
> > > > > > And also I think that "help" thing comes to the users who raised
> > > their
> > > > > > issues (some of them undoubtedly listening here) - we will need
> > their
> > > > > help
> > > > > > in at least testing solutions and commenting on hypotheses.
> > > > > >
> > > > > > Maybe we can figure out a way of working (commenting on issues,
> > > > triaging
> > > > > > approach, issue solving attempt, way of asking for help)? that
> will
> > > > > > "catalyse" such knowledge transfer.
> > > > > >
> > > > > > But I also might be wrong in my assesment - so I'd love to hear
> > what
> > > > > others
> > > > > > might say here - maybe also have some proposals how we could
> > > reorganise
> > > > > to
> > > > > > handle open issues better (and to handle some of the challenges
> > > > > involved).
> > > > > > Undoubtedly such knowledge transfer has some risks that solving
> > > issues
> > > > > will
> > > > > > slow down - at least initially, so we have to be rather careful
> > with
> > > > this
> > > > > > approach and have clear boundary of trust from the experts that
> > > things
> > > > > will
> > > > > > be solved when they are guiding somoene.
> > > > > >
> > > > > > J.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, Oct 5, 2025 at 8:13 PM Vikram Koka via dev <
> > > > > [email protected] <mailto:[email protected]>
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Dear Airflowers,
> > > > > > >
> > > > > > > I am looking forward to meeting many of you this coming week at
> > the
> > > > > > Airflow
> > > > > > > Summit. It will be wonderful to connect in person after a year
> of
> > > > > online
> > > > > > > collaboration since the last Summit.
> > > > > > >
> > > > > > > I’d like to put a proposal in front of all of you. We’re sure
> to
> > > hear
> > > > > > > valuable feedback from users who have adopted or are adopting
> > > Airflow
> > > > > 3.
> > > > > > My
> > > > > > > proposal is that we dedicate October, the four weeks following
> > the
> > > > > > Summit,
> > > > > > > to polishing work rather than new feature development.
> > > > > > >
> > > > > > > This would mean focusing on smoothing out any rough edges in
> the
> > > > > adoption
> > > > > > > journey and making it easier for users to take full advantage
> of
> > > the
> > > > > new
> > > > > > > capabilities we’ve released. Depending on the aggregated
> > feedback,
> > > we
> > > > > can
> > > > > > > also consider multiple patch releases during this period to
> > quickly
> > > > > > > incorporate improvements.
> > > > > > >
> > > > > > > As part of this, let's make sure feedback is easy to track:
> > > > > > >
> > > > > > > - System of record: Use Github issues
> > > > > > > <https://github.com/apache/airflow/issues> 
> > > > > > > <https://github.com/apache/airflow/issues&gt;> as the source of
> > > > truth,
> > > > > > even
> > > > > > > if there is a conversation over slack or on the dev list.
> > > > > > > - Version labelling: Include the Airflow version so it can
> be
> > > > > labeled
> > > > > > > appropriated (label:affected_version either 3.0 or 3.1),
> > easily
> > > > > > > reproduced
> > > > > > > and resolved.
> > > > > > > - Upgrade blockers: Indicate if this affects upgrades from
> > 2.x.
> > > We
> > > > > > have
> > > > > > > been labeling and tracking these separately.
> > > > > > > - Documentation vs. code: Indicate if this is a
> documentation
> > > gap,
> > > > > > > rather than a code problem.
> > > > > > > - Context: Airflow's flexibility allows for a wide range of
> > > > > behavior.
> > > > > > > With Airflow 3's architectural changes, especially the new
> > > TaskSDK
> > > > > > > model,
> > > > > > > some implicit behaviors may now need to be explicitly
> > specified.
> > > > If
> > > > > > you
> > > > > > > found anything confusing or frustrating, please let us know
> > if a
> > > > > > > documentation update, upgrade script change, or a clarifying
> > > > example
> > > > > > > would
> > > > > > > be helpful.
> > > > > > >
> > > > > > > We are looking for active participation from everyone,
> including
> > > > those
> > > > > > who
> > > > > > > haven't contributed before. Even a small contribution such as a
> > > clear
> > > > > > > reproduction scenario, a documentation improvement, or a simple
> > > > upgrade
> > > > > > > script update can make a big difference.
> > > > > > >
> > > > > > > Thank you and best regards,
> > > > > > > Vikram
> > > > > > > --
> > > > > > >
> > > > > > > Vikram Koka
> > > > > > > Chief Strategy Officer
> > > > > > > Email: [email protected] <mailto:[email protected]>
> > > > > > >
> > > > > > >
> > > > > > > <https://www.astronomer.io/> <https://www.astronomer.io/&gt;>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS]: Proposal to focus on polishing Airflow 3

Reply via email to