Hey everyone,

Thank you for attending the dev call on the 12th of March. I updated
our meeting notes on the Airflow wiki and the link for those notes is here
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886699#Airflow3.xDevCall:Meetingnotes-Summary.33>

To everyone who attended the meeting, please check the summary and add
anything I may have missed. For those who could not join, please let us
know if you disagree with anything discussed and agreed upon in
the meeting. Also, please do ask questions if something is unclear.

Our next meeting is scheduled for the 26th of March at the same time.
Please note that due to the US daylight saving time change, this time may
be off by an hour for your time zone. It is scheduled for 9 a.m. Pacific
Time on 12th of March.

The agenda is already populated, primarily with Airflow 3.2 AIP updates. If
you would like to keep this call to discuss a particular topic, please let
me know if you would like to add anything to the agenda
<https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886699#Airflow3.xDevCall:Meetingnotes-26March2026>
.

Best regards,
Vikram
--
Below is the summary from the last call:

   - Catch-up on action items from last call
      - Helm chart discussions update (Bugra Ozturk)
         - Bugra shared that Jarek, Jed, Jens, and others had met to
         discuss the Helm chart roadmap.
         - The high-level agreement was to continue with minor releases
         until all deprecations are complete and breaking changes are fully
         documented, before proceeding to a 2.0 release.
         - A follow-up meeting is planned for next week, with notes and
         action items to be shared on the dev list, including
versioning details and
         security measures.
         - A concrete plan is expected in about three weeks, with the
         overall project timeline measured in months rather than weeks
      - Airflow 3.2 Development updates:
      - Testing / Release Manager Update (Rahul Vats)
         - Rahul gave a 3.2 test plan update. Beta 1 was cut last week and
         testing is actively underway.
         - Regression testing is looking good overall, including the Task
         SDK changes.
         - There are close to 20 migrations in 3.2, making this both
         important and risky — a few edge cases have already been
identified and
         issues raised.
         - One notable concern is that the Deadline Alerts migration tasks
         appear likely to break with large datasets due to complex
migration logic.
         - There are also some migration challenges with SQLite currently
         being worked through. Open beta issues are being tracked here
         
<https://github.com/apache/airflow/issues?q=state%3Aopen%20label%3A%22affected_version%3A3.2.0beta%22>
         .
         - The current plan is Beta 2 for Monday and RC1 for the week of
         March 20th. Rahul asked for community help with testing
      - UI / API swim lane update (Pierre Jeambrun / Brent Bovenzi)
         - Pierre reported great activity from the community with lots of
         fixes merged.
         - Progress is being made on the optimization front and things are
         on a good track, with more work still to do.
         - Asset Partitions UI work is also making good progress.
      - Asset Partitions (Wei Lee / Tzu-ping Chung)
         - The only remaining open items are documentation and a couple of
         issues to fix. The work is otherwise nearly complete.
      - Deadline Alerts (Dennis Ferruzzi)
         - Dennis has a PR open for review and asked Ash and Amogh
         specifically for a look.
      - Multi-team (Niko Oliveira / Vincent Beck)
         - The team is in a bug-finding and bug-fixing phase and things are
         looking good for RC.
      - Python Async Operator docs (David Blain)
         - Documentation is in flight.
      - OTel update (Daniel Standish)
      - Daniel flagged that there are OTel interface changes in 3.2.
         - The existing OTel packages were never formally documented nor
         intended as stable public interfaces. Christos chimed in to
support this.
         - A lazy consensus email has been sent by Daniel to the dev list
         to deprecate the existing interfaces from 2.10 onwards
      - Discussion topic: Feedback from Airflow 2.11→3.1.7 Migration at
   Bosch (Marco Küttelwesch
   <https://cwiki.apache.org/confluence/display/~automationdev85> / Jens
   Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)
      -

      Jens and Marco gave a detailed and very valuable presentation on the
      challenges their team at Bosch faced upgrading a large, complex Airflow
      environment.
      -

      The team expressed gratitude for this real-world feedback
      -

      Issues they saw prior migration
      -

         *Previous integration we had with direct DB access* - workaround
         is to have a parallel DB connection until better solution(s)
         -

            Context does not provide status of other tasks in the Dag for
            Error handling/summary
            -

            Triggering and monitoring other dags not possible w/o DB
            connection (needed for scaling)
            -

            Setting of Dag and Task Notes not possible  --> Contribution to
            Task SDK was not (yet) accepted
            -

            Automation which adjust Pools and Queues for dedicated machine
            routing w/o DB access
            -

            Custom Dag level Archiving w/o DB Access
            -

         *Suprises by breaking APIs* that required some migration helpers:
         -

            XCom for SkipMixIn different semantic and key name changed in
            BranchOperator
            -

            XCom does not accept pathlib.Path objects anymore (==PosixPath()
            )
            -

            Some other dataclass serialization problem in XCom
            -

         *Python code in VenvOperator* does not see code from Dag folder
         anymore --> Fix by adding PYTHONPATH manually
         -

         *Triggerer did not see custom code* from deferred sensor anymore,
         needed to tweak via adding to PYTHONPATH (Kafka Message Filter)
         -

         *Airflow Python API breaking changes,* needed to create a tool
         that can support AF2+3 backends to allow changing backend w/o
integrated
         app change in parallel
         -

         Issue with XCom retrieval of Mapped task with mapping==1, not
         returning a list (Fix PR is open in parallel)
         -

         Our patched and beloved "custom footer" which showed Git sync
         status was not possible to be ported, alternative still in
work via Plugin
         -

         Markdown in Trigger form UI was interpreted differently compared
         to AF2, UI glitches needed correction
         -

         Needed to change a lot of Dags interpreting "execution date" ->
         "logical date"
         -

         Trigger form failed with None as default in an enum with optional
         field (fix in AF3.1.8 coming)
         -

         Fab provider Oauth integration issues, mixing user contexts (Fixed
         in current provider)
         -

         Cookie of session across application, mixing different user
         contexts on multi site deployments (Fixed in 3.1.8)
         -

         Needed to re-write a MS EventGrid plugin receiving push events
         (Flask->FastAPI App)
         -

         Task log URL generation needed to change
         -

         Link to listing of "All failed tasks of all runs of a Dag with a
         Run-ID prefix" not existing anymore
         - Issues faced during or post migration
         -

         *Dag processor failed because of "unstable rendered Dags"* -->
         Version increase --> Full table scan in TI --> Flaky Dags -->
Fixed by DB
         index on dag_version_id (Upstream contribution pending)
         -

         *Scheduler were not healthy because of unstable Dag* (after fixing
         Dag parser) required consistency reconciliation when running
in large queue
         and rotating with every Dag parse, all Schedulers locked DB for
         re-conciliation and workers and API server were locked out of access,
         heartbeat failed (Upstream contribution pending)
         -

            Hard to see which background task and including which Dag
            caused the problem until debug code injected, no metric on
scheduler
            details!
            -

         *We see some Postgres DB Locks on task_instances* (~5min, [image:
         (warning)]  root unclear until now) - tasks fail as heartbeat
         every 5s, retry after 5s, API server running out of DB
connections, kills
         Pods in liveness as DB connection pool exhausted by DDOS from workers
         -

         *JWT token timeout of 10min too short* if tasks are staying in
         Celery queue, way too short, such failures produce empty
logs, hard to find
         -

         *Previously limited Dags via "max_active_tasks" were running
         almost uncontrolled*, lot of complaints by affected groups
         suffering from capacity
         -

         *Happily running tasks were killed by re-assignment of Tasks in
         Celery after 1h* (w/o any config change in Celery!), need to
         disable LATE_ACK
         -

         Bad Unicode chars killed jobs on DockerOperator --> Fix contributed
         -

         Bad Unicode chars kill KPO jobs on triggerer --> PR in review in
         parallel (was closed because of AI Slop 2 times…)
         -

         Celery worker needing 50% more RAM for same concurrency (8GB->12GB
         for 16 tasks), generated a couple of OOMs
         -

         Locally patch to count deferred task instances into running (PR in
         Airflow open in discussion in parallel)
         -

         Redirect after login broken if user not authenticated (Fix in
         3.1.8)
         -

         (new) Validation that run id prefix matches run-type
         -

         [image: (warning)] * We still see DB Locks lasting for 5+ minutes
         in production and seek for root cause*
         -

      Open items we see as gaps post migration
      -

         *UI: Ability to batch clear/mark as failed Dag runs* (Target:
         Workaround via script) → #63854
         <https://github.com/apache/airflow/issues/63854> + #63855
         <https://github.com/apache/airflow/issues/63855>
         -

         *UI: Presentation of Task Notes is hard*, proposal was rejected...
         planning to pitch an AIP soon as a new feature for "Dag/Task
Summaries for
         Humans"
         -

         *UI: Scrolling in large logs is as bad as in Github* - initial
         loading faster but scrolling through in total is worse than
in AF2. Also as
         partially loaded no browser search possible --> Search option in panel?
         -

         UI (requested but coming in 3.2.0): : Grid filter left panel by
         run status (running, failed, success at least)
         -

         UI (requested but coming in 3.2.0): Better filtering in Dag runs /
         Task Instances
         -

         UI (requested but coming in 3.2.0): Re-run Dag with previous config
         -

         UI nit: Can not click mapped task that is scheduled as link is
         missing
         -

         UI nit bug: Failure to load right panel when clicking on a task
         group left in grid
         -

         UI nit: Missing to color code test/development instances - hope
         this is coming with 3.2.0 theming...
         -

         UI nit: Admin / Pool view is bad if 20+ Pools listed, bad overview
         -

         UI nit: complaints that failed and skipped task color tooo similar
         --> will adjust locally with custom theming in 3.2.0?
         -

         UI nit: a lot of user bookmarks broken/404 (e.g.
         http://host/airflow/dags/<my_dag>/grid)
         -

      Compensated by very positive feedback in regards to
      -

         Now it is directly possible to see who triggered a Dag
         -

         URLs to logs finally working (was a pain with Grid in AF2)
         -

         UI: Much more modern than AF2!
         -

         UI: Translations are welcome!
         -

         HITL opens new options for approving runs which has a actual
         demand ATM
         -

         All users enjoy dark mode!




--

Vikram Koka
Chief Strategy Officer
Email: [email protected]


<https://www.astronomer.io/>

Reply via email to