My take (and details can be found in the discussion):

2. Don't make the impression it is something that we will support - and
explain to the users that it **WILL** break in the future and it's on
**THEM** to fix when it breaks.

The 2 is **kinda** possible but we should strongly discourage this and say
"this will break any time and it's you who have to adapt to any future
changes in schema" - we had a lot of similar cases in the past where our
users felt entitled to get **something** they felt as "valid way of using
things" broken by our changes. If we say "recommended" they will take it as
"and all the usage there is expected to work when Airlfow gets a new
version so I should be fully entitled to open a valid issue when things
change".  I think "recommended" in this case is far too strong from our
side.

3. Absolutely remove.

Sounds like we are going back to Airflow 2 behaviour. And we've made all
the effort to break out of that. Various things will start breaking in
Airflow 3.2 and beyond. Once we complete the task isolation work, Airflow
workers will NOT have sqlalchemy package installed by default - it simply
will not be task-sdk dependency. The fact that you **can** use sqlalchemy
now is mostly a by-product of the fact that we have not completed the split
yet - but it was not even **SUPPOSED** to work.

J.



On Tue, Nov 4, 2025 at 10:03 AM Amogh Desai <[email protected]> wrote:

> Hi All,
>
> I'm working on expanding the Airflow 3 upgrade documentation to address a
> frequently asked question from users
> migrating from Airflow 2.x: "How do I access the metadata database from my
> tasks now that direct database
> access is blocked?"
>
> Currently, Step 5 of the upgrade guide[1] only mentions that direct DB
> access is blocked and points to a GitHub issue.
> However, users need concrete guidance on migration options.
>
> I've drafted documentation via [2] describing three approaches, but before
> proceeding to finalising this, I'd like to get community
> consensus on how we should present these options, especially given the
> architectural principles we've established with
> Airflow 3.
>
> ## Proposed Approaches
>
> Approach 1: Airflow Python Client (REST API)
> - Uses `apache-airflow-client` [3] to interact via REST API
> - Pros: No DB drivers needed, aligned with Airflow 3 architecture,
> API-first
> - Cons: Requires package installation, API server dependency, auth token
> management, limited operations possible
>
> Approach 2: Database Hooks (PostgresHook/MySqlHook)
> - Create a connection to metadata DB and use DB hooks to execute SQL
> directly
> - Pros: Uses Airflow connection management, simple SQL interface
> - Cons: Requires DB drivers, direct network access, bypasses Airflow API
> server and connects to DB directly
>
> Approach 3: Direct SQLAlchemy Access (last resort)
> - Use environment variable with DB connection string and create SQLAlchemy
> session directly
> - Pros: Maximum flexibility
> - Cons: Bypasses all Airflow protections, schema coupling, manual
> connection management, worst possible option.
>
> I was expecting some pushback regarding these approaches and there were
> (rightly) some important concerns raised
> by Jarek about Approaches 2 and 3:
>
> 1. Breaks Task Isolation - Contradicts Airflow 3's core promise
> 2. DB as Public Interface - Schema changes would require release notes and
> break user code
> 3. Performance Impact - Using Approach 2 creates direct DB access and can
> bring back Airflow 2's
> connection-per-task overhead
> 4. Security Model Violation - Contradicts documented isolation principles
>
> Considering these comments, this is what I want to document now:
>
> 1. Approach 1 - Keep as primary/recommended solution (aligns with Airflow 3
> architecture)
> 2. Approach 2 - Present as "known workaround" (not recommendation) with
> explicit warnings
> about breaking isolation, schema not being public API, performance
> implications, and no support guarantees
> 3. Approach 3 - Remove entirely, or keep with strongest possible warnings
> (would love to hear what others think for
> this one particularly)
>
> Once we arrive at some discussion points on this one, I would like to call
> for a lazy consensus for posterity and visibility
> of the community.
>
> Looking forward to your feedback!
>
> [1]
>
> https://github.com/apache/airflow/blob/main/airflow-core/docs/installation/upgrading_to_airflow3.rst#step-5-review-custom-operators-for-direct-db-access
> [2] https://github.com/apache/airflow/pull/57479
> [3] https://github.com/apache/airflow-client-python
>

Reply via email to