Re: [DISCUSS] How to document DB access options in Airflow 3 upgrade docs

Amogh Desai Thu, 06 Nov 2025 23:29:17 -0800

I will be waiting for responses on this discussion before creating a lazy
consensus till *Tue, Nov 11, 3:00 PM UTC*


So, if you have thoughts, feel free to chime in now :)

Thanks & Regards,
Amogh Desai


On Fri, Nov 7, 2025 at 4:57 AM Buğra Öztürk <[email protected]> wrote:

> Great initiative Amogh, thanks! I agree with others on 1 and not
> encouraging for 2 as well.
>
> Idea of filling the gaps with adding more endpoints would enable more
> automation with a secure environment in the long run. In addition, we can
> consider providing some more granular clean up/db functionality on CLI too
> where those could be automated on server side with Admin commands and not
> from Dags, just an idea.
>
> I hope we will add airflowctl there soon, of course with limited
> opwrations. 🤞
>
> Bugra Ozturk
>
> On Thu, 6 Nov 2025, 14:32 Amogh Desai, <[email protected]> wrote:
>
> > Looking for some more eyes on this one.
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Thu, Nov 6, 2025 at 12:55 PM Amogh Desai <[email protected]>
> wrote:
> >
> > > > Yes, API could do this with 5-times more code including the limits
> per
> > > response where you need to loop over all pages until you have a full
> > > list (e.g. API limited to 100 results). Not impossible but a lot of
> > > re-implementation.
> > >
> > > Just wondering, why not vanilla task mapping?
> > >
> > > > Might be something that could be a potential contributionto "airflow
> db
> > > clean"
> > >
> > > Maybe, yes.
> > >
> > > Thanks & Regards,
> > > Amogh Desai
> > >
> > >
> > > On Thu, Nov 6, 2025 at 12:53 PM Amogh Desai <[email protected]>
> > wrote:
> > >
> > >> > I think our efforts should be way more focused on adding some
> missing
> > >> API
> > >> calls in Task SDK that our users miss, rather than in allowing them to
> > use
> > >> "old ways". Every time someone says "I cannot migrate because i did
> > this",
> > >> our first thought should be:
> > >>
> > >> * is it a valid way?
> > >> * is it acceptable to have an API call for it in SDK?
> > >> * should we do it ?
> > >>
> > >>
> > >> That is currently a grey zone we need to define better I think.
> Certain
> > >> use cases might be general
> > >> enough that we need an execution API endpoint for that, and we can
> > >> certainly do that. But there will
> > >> also be cases when the use case is niche and we will NOT want to have
> > >> execution API endpoints
> > >> for that for various reasons. The harder problem to solve is the
> latter.
> > >>
> > >> But you make a fair point here.
> > >>
> > >>
> > >>
> > >> Thanks & Regards,
> > >> Amogh Desai
> > >>
> > >>
> > >> On Thu, Nov 6, 2025 at 2:33 AM Jens Scheffler <[email protected]>
> > >> wrote:
> > >>
> > >>> > Thanks for your comments too, Jens.
> > >>> >
> > >>> >>    * Aggregate status of tasks in the upstream of same Dag (pass,
> > >>> fail,
> > >>> >>      listing)
> > >>> >>
> > >>> >> Does the DAG run page not show that?
> > >>> Partly yes, but in our environment it is a bit more complex than
> > >>> "pass/fail". Bit more complex story, we want to know more details of
> > the
> > >>> failed and aggregate details. So high-level saying get the XCom from
> > >>> failed and then aggregate details. Imagine all tasks ahve an owner
> and
> > >>> we want to send a notification to each owner but if 10 tasks from one
> > >>> owner fail we want to send 1 notification with 10 failed in the text.
> > >>> And, yes, can be done via API.
> > >>> >>    * Custom mass-triggering of other dags and collection of
> results
> > >>> from
> > >>> >>     triggered dags as scale-out option for dynamic task mapping
> > >>> >>
> > >>> >> Can't an API do that?
> > >>> Yes, API could do this with 5-times more code including the limits
> per
> > >>> response where you need to loop over all pages until you have a full
> > >>> list (e.g. API limited to 100 results). Not impossible but a lot of
> > >>> re-implementation.
> > >>> >>    * And the famous: Partial database clean on a per Dag level
> with
> > >>> >>      different retention
> > >>> >>
> > >>> >> Can you elaborate this one a bit :D
> > >>>
> > >>> Yes. We have some Dag that is called 50k-100k times per day and
> others
> > >>> that are called 12 times a day. And a lot of others in-between like
> 25k
> > >>> runs per month. The Dag with 100k runs per day we want to archive
> ASAP
> > >>> probably after 3 days for all not failed calls to reduce DB overhead.
> > >>> The failed ones we keep for 14 days for potential re-processing if
> > there
> > >>> was an outage.
> > >>>
> > >>> Most other Dag Runs we keep for a month. And some we cap that we
> > archive
> > >>> if more than 25k runs
> > >>>
> > >>> Might be something that could be a potential contributionto "airflow
> db
> > >>> clean"
> > >>>
> > >>> >>
> > >>> >> Thanks & Regards,
> > >>> >> Amogh Desai
> > >>> >>
> > >>> >>
> > >>> >> On Wed, Nov 5, 2025 at 3:12 AM Jens Scheffler <
> [email protected]>
> > >>> wrote:
> > >>> >>
> > >>> >> Thanks Amough for adding docs for migration hints.
> > >>> >>
> > >>> >> We actually suffer a lot of integrations that had been built in
> the
> > >>> past
> > >>> >> which now makes it hard and serious effort to migrate to version
> 3.
> > So
> > >>> >> most probably we ourself need to take option 2 but knowing (like
> in
> > >>> the
> > >>> >> past) that you can not ask for support. But at least this
> un-blocks
> > us
> > >>> >> from staying with 2.x
> > >>> >>
> > >>> >> I'd love to take route 1 as well but then a lot of code needs to
> be
> > re
> > >>> >> written. This will take time, And in mid term we will migrate to
> > (1).
> > >>> >>
> > >>> >> As in the dev call I'd love if in Airflow 3.2 we could have
> option 1
> > >>> >> supported out-of-the-box - knowing that some security discussion
> is
> > >>> >> implied, so maybe need to be turned on and not be enabled by
> > default.
> > >>> >>
> > >>> >> The use cases we have and which requires some kind of DB access
> > where
> > >>> >> TaskSDK is not helping with support
> > >>> >>
> > >>> >>    * Adding task and dag run notes to tasks as better readable
> > status
> > >>> >>      while and after execution
> > >>> >>    * Aggregate status of tasks in the upstream of same Dag (pass,
> > >>> fail,
> > >>> >>      listing)
> > >>> >>    * Custom mass-triggering of other dags and collection of
> results
> > >>> from
> > >>> >>      triggered dags as scale-out option for dynamic task mapping
> > >>> >>    * Adjusting Pools based on available workers
> > >>> >>    * Checking results of pass/fail per edge worker and depending
> on
> > >>> >>      stability adjusting Queues on Edge workers based on status
> and
> > >>> >>      errors of workers
> > >>> >>    * Adjust Pools based on time of day
> > >>> >>    * And the famous: Partial database clean on a per Dag level
> with
> > >>> >>      different retention
> > >>> >>
> > >>> >> I would be okay removing option 3 and a clear warning to option 2
> is
> > >>> >> also okay.
> > >>> >>
> > >>> >> Jens
> > >>> >>
> > >>> >> On 11/4/25 13:06, Jarek Potiuk wrote:
> > >>> >>> My take (and details can be found in the discussion):
> > >>> >>>
> > >>> >>> 2. Don't make the impression it is something that we will
> support -
> > >>> and
> > >>> >>> explain to the users that it **WILL** break in the future and
> it's
> > on
> > >>> >>> **THEM** to fix when it breaks.
> > >>> >>>
> > >>> >>> The 2 is **kinda** possible but we should strongly discourage
> this
> > >>> and
> > >>> >> say
> > >>> >>> "this will break any time and it's you who have to adapt to any
> > >>> future
> > >>> >>> changes in schema" - we had a lot of similar cases in the past
> > where
> > >>> our
> > >>> >>> users felt entitled to get **something** they felt as "valid way
> of
> > >>> using
> > >>> >>> things" broken by our changes. If we say "recommended" they will
> > >>> take it
> > >>> >> as
> > >>> >>> "and all the usage there is expected to work when Airlfow gets a
> > new
> > >>> >>> version so I should be fully entitled to open a valid issue when
> > >>> things
> > >>> >>> change".  I think "recommended" in this case is far too strong
> from
> > >>> our
> > >>> >>> side.
> > >>> >>>
> > >>> >>> 3. Absolutely remove.
> > >>> >>>
> > >>> >>> Sounds like we are going back to Airflow 2 behaviour. And we've
> > made
> > >>> all
> > >>> >>> the effort to break out of that. Various things will start
> breaking
> > >>> in
> > >>> >>> Airflow 3.2 and beyond. Once we complete the task isolation work,
> > >>> Airflow
> > >>> >>> workers will NOT have sqlalchemy package installed by default -
> it
> > >>> simply
> > >>> >>> will not be task-sdk dependency. The fact that you **can** use
> > >>> sqlalchemy
> > >>> >>> now is mostly a by-product of the fact that we have not completed
> > the
> > >>> >> split
> > >>> >>> yet - but it was not even **SUPPOSED** to work.
> > >>> >>>
> > >>> >>> J.
> > >>> >>>
> > >>> >>>
> > >>> >>>
> > >>> >>> On Tue, Nov 4, 2025 at 10:03 AM Amogh Desai<
> [email protected]>
> > >>> >> wrote:
> > >>> >>>> Hi All,
> > >>> >>>>
> > >>> >>>> I'm working on expanding the Airflow 3 upgrade documentation to
> > >>> address
> > >>> >> a
> > >>> >>>> frequently asked question from users
> > >>> >>>> migrating from Airflow 2.x: "How do I access the metadata
> database
> > >>> from
> > >>> >> my
> > >>> >>>> tasks now that direct database
> > >>> >>>> access is blocked?"
> > >>> >>>>
> > >>> >>>> Currently, Step 5 of the upgrade guide[1] only mentions that
> > direct
> > >>> DB
> > >>> >>>> access is blocked and points to a GitHub issue.
> > >>> >>>> However, users need concrete guidance on migration options.
> > >>> >>>>
> > >>> >>>> I've drafted documentation via [2] describing three approaches,
> > but
> > >>> >> before
> > >>> >>>> proceeding to finalising this, I'd like to get community
> > >>> >>>> consensus on how we should present these options, especially
> given
> > >>> the
> > >>> >>>> architectural principles we've established with
> > >>> >>>> Airflow 3.
> > >>> >>>>
> > >>> >>>> ## Proposed Approaches
> > >>> >>>>
> > >>> >>>> Approach 1: Airflow Python Client (REST API)
> > >>> >>>> - Uses `apache-airflow-client` [3] to interact via REST API
> > >>> >>>> - Pros: No DB drivers needed, aligned with Airflow 3
> architecture,
> > >>> >>>> API-first
> > >>> >>>> - Cons: Requires package installation, API server dependency,
> auth
> > >>> token
> > >>> >>>> management, limited operations possible
> > >>> >>>>
> > >>> >>>> Approach 2: Database Hooks (PostgresHook/MySqlHook)
> > >>> >>>> - Create a connection to metadata DB and use DB hooks to execute
> > SQL
> > >>> >>>> directly
> > >>> >>>> - Pros: Uses Airflow connection management, simple SQL interface
> > >>> >>>> - Cons: Requires DB drivers, direct network access, bypasses
> > >>> Airflow API
> > >>> >>>> server and connects to DB directly
> > >>> >>>>
> > >>> >>>> Approach 3: Direct SQLAlchemy Access (last resort)
> > >>> >>>> - Use environment variable with DB connection string and create
> > >>> >> SQLAlchemy
> > >>> >>>> session directly
> > >>> >>>> - Pros: Maximum flexibility
> > >>> >>>> - Cons: Bypasses all Airflow protections, schema coupling,
> manual
> > >>> >>>> connection management, worst possible option.
> > >>> >>>>
> > >>> >>>> I was expecting some pushback regarding these approaches and
> there
> > >>> were
> > >>> >>>> (rightly) some important concerns raised
> > >>> >>>> by Jarek about Approaches 2 and 3:
> > >>> >>>>
> > >>> >>>> 1. Breaks Task Isolation - Contradicts Airflow 3's core promise
> > >>> >>>> 2. DB as Public Interface - Schema changes would require release
> > >>> notes
> > >>> >> and
> > >>> >>>> break user code
> > >>> >>>> 3. Performance Impact - Using Approach 2 creates direct DB
> access
> > >>> and
> > >>> >> can
> > >>> >>>> bring back Airflow 2's
> > >>> >>>> connection-per-task overhead
> > >>> >>>> 4. Security Model Violation - Contradicts documented isolation
> > >>> >> principles
> > >>> >>>> Considering these comments, this is what I want to document now:
> > >>> >>>>
> > >>> >>>> 1. Approach 1 - Keep as primary/recommended solution (aligns
> with
> > >>> >> Airflow 3
> > >>> >>>> architecture)
> > >>> >>>> 2. Approach 2 - Present as "known workaround" (not
> recommendation)
> > >>> with
> > >>> >>>> explicit warnings
> > >>> >>>> about breaking isolation, schema not being public API,
> performance
> > >>> >>>> implications, and no support guarantees
> > >>> >>>> 3. Approach 3 - Remove entirely, or keep with strongest possible
> > >>> >> warnings
> > >>> >>>> (would love to hear what others think for
> > >>> >>>> this one particularly)
> > >>> >>>>
> > >>> >>>> Once we arrive at some discussion points on this one, I would
> like
> > >>> to
> > >>> >> call
> > >>> >>>> for a lazy consensus for posterity and visibility
> > >>> >>>> of the community.
> > >>> >>>>
> > >>> >>>> Looking forward to your feedback!
> > >>> >>>>
> > >>> >>>> [1]
> > >>> >>>>
> > >>> >>>>
> > >>> >>
> > >>>
> >
> https://github.com/apache/airflow/blob/main/airflow-core/docs/installation/upgrading_to_airflow3.rst#step-5-review-custom-operators-for-direct-db-access
> > >>> >>>> [2]https://github.com/apache/airflow/pull/57479
> > >>> >>>> [3]https://github.com/apache/airflow-client-python
> > >>> >>>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: [email protected]
> > >>> For additional commands, e-mail: [email protected]
> > >>>
> > >>>
> >
>

Re: [DISCUSS] How to document DB access options in Airflow 3 upgrade docs

Reply via email to