BTW. General comment. I think our efforts should be way more focused on adding some missing API calls in Task SDK that our users miss, rather than in allowing them to use "old ways". Every time someone says "I cannot migrate because i did this", our first thought should be:
* is it a valid way? * is it acceptable to have an API call for it in SDK? * should we do it ? I think more often than not we will find that the answer to all those three questions is "yes" - and that is the way to handle most cases where people find it difficult to migrate things. J. On Wed, Nov 5, 2025 at 7:18 AM Amogh Desai <[email protected]> wrote: > > Sounds like we are going back to Airflow 2 behaviour. And we've made all > the effort to break out of that. Various things will start breaking in > Airflow 3.2 and beyond. Once we complete the task isolation work, Airflow > workers will NOT have sqlalchemy package installed by default - it simply > will not be task-sdk dependency. The fact that you **can** use sqlalchemy > now is mostly a by-product of the fact that we have not completed the split > yet - but it was not even **SUPPOSED** to work. > > You make a valid point! Although my aim was to get users to have a upgrade > window (we already > provide one) so that they can safely migrate their workflows if possible to > use the Airflow Python > Client. But giving them abilities for DB would again be a time travel back > to *insecure* past so I think > I am OK removing option 3. > > > Thanks for your comments too, Jens. > > > * Aggregate status of tasks in the upstream of same Dag (pass, fail, > listing) > > Does the DAG run page not show that? > > > * Custom mass-triggering of other dags and collection of results from > triggered dags as scale-out option for dynamic task mapping > > Can't an API do that? > > > * And the famous: Partial database clean on a per Dag level with > different retention > > Can you elaborate this one a bit :D > > Thanks & Regards, > Amogh Desai > > > On Wed, Nov 5, 2025 at 3:12 AM Jens Scheffler <[email protected]> wrote: > > > Thanks Amough for adding docs for migration hints. > > > > We actually suffer a lot of integrations that had been built in the past > > which now makes it hard and serious effort to migrate to version 3. So > > most probably we ourself need to take option 2 but knowing (like in the > > past) that you can not ask for support. But at least this un-blocks us > > from staying with 2.x > > > > I'd love to take route 1 as well but then a lot of code needs to be re > > written. This will take time, And in mid term we will migrate to (1). > > > > As in the dev call I'd love if in Airflow 3.2 we could have option 1 > > supported out-of-the-box - knowing that some security discussion is > > implied, so maybe need to be turned on and not be enabled by default. > > > > The use cases we have and which requires some kind of DB access where > > TaskSDK is not helping with support > > > > * Adding task and dag run notes to tasks as better readable status > > while and after execution > > * Aggregate status of tasks in the upstream of same Dag (pass, fail, > > listing) > > * Custom mass-triggering of other dags and collection of results from > > triggered dags as scale-out option for dynamic task mapping > > * Adjusting Pools based on available workers > > * Checking results of pass/fail per edge worker and depending on > > stability adjusting Queues on Edge workers based on status and > > errors of workers > > * Adjust Pools based on time of day > > * And the famous: Partial database clean on a per Dag level with > > different retention > > > > I would be okay removing option 3 and a clear warning to option 2 is > > also okay. > > > > Jens > > > > On 11/4/25 13:06, Jarek Potiuk wrote: > > > My take (and details can be found in the discussion): > > > > > > 2. Don't make the impression it is something that we will support - and > > > explain to the users that it **WILL** break in the future and it's on > > > **THEM** to fix when it breaks. > > > > > > The 2 is **kinda** possible but we should strongly discourage this and > > say > > > "this will break any time and it's you who have to adapt to any future > > > changes in schema" - we had a lot of similar cases in the past where > our > > > users felt entitled to get **something** they felt as "valid way of > using > > > things" broken by our changes. If we say "recommended" they will take > it > > as > > > "and all the usage there is expected to work when Airlfow gets a new > > > version so I should be fully entitled to open a valid issue when things > > > change". I think "recommended" in this case is far too strong from our > > > side. > > > > > > 3. Absolutely remove. > > > > > > Sounds like we are going back to Airflow 2 behaviour. And we've made > all > > > the effort to break out of that. Various things will start breaking in > > > Airflow 3.2 and beyond. Once we complete the task isolation work, > Airflow > > > workers will NOT have sqlalchemy package installed by default - it > simply > > > will not be task-sdk dependency. The fact that you **can** use > sqlalchemy > > > now is mostly a by-product of the fact that we have not completed the > > split > > > yet - but it was not even **SUPPOSED** to work. > > > > > > J. > > > > > > > > > > > > On Tue, Nov 4, 2025 at 10:03 AM Amogh Desai<[email protected]> > > wrote: > > > > > >> Hi All, > > >> > > >> I'm working on expanding the Airflow 3 upgrade documentation to > address > > a > > >> frequently asked question from users > > >> migrating from Airflow 2.x: "How do I access the metadata database > from > > my > > >> tasks now that direct database > > >> access is blocked?" > > >> > > >> Currently, Step 5 of the upgrade guide[1] only mentions that direct DB > > >> access is blocked and points to a GitHub issue. > > >> However, users need concrete guidance on migration options. > > >> > > >> I've drafted documentation via [2] describing three approaches, but > > before > > >> proceeding to finalising this, I'd like to get community > > >> consensus on how we should present these options, especially given the > > >> architectural principles we've established with > > >> Airflow 3. > > >> > > >> ## Proposed Approaches > > >> > > >> Approach 1: Airflow Python Client (REST API) > > >> - Uses `apache-airflow-client` [3] to interact via REST API > > >> - Pros: No DB drivers needed, aligned with Airflow 3 architecture, > > >> API-first > > >> - Cons: Requires package installation, API server dependency, auth > token > > >> management, limited operations possible > > >> > > >> Approach 2: Database Hooks (PostgresHook/MySqlHook) > > >> - Create a connection to metadata DB and use DB hooks to execute SQL > > >> directly > > >> - Pros: Uses Airflow connection management, simple SQL interface > > >> - Cons: Requires DB drivers, direct network access, bypasses Airflow > API > > >> server and connects to DB directly > > >> > > >> Approach 3: Direct SQLAlchemy Access (last resort) > > >> - Use environment variable with DB connection string and create > > SQLAlchemy > > >> session directly > > >> - Pros: Maximum flexibility > > >> - Cons: Bypasses all Airflow protections, schema coupling, manual > > >> connection management, worst possible option. > > >> > > >> I was expecting some pushback regarding these approaches and there > were > > >> (rightly) some important concerns raised > > >> by Jarek about Approaches 2 and 3: > > >> > > >> 1. Breaks Task Isolation - Contradicts Airflow 3's core promise > > >> 2. DB as Public Interface - Schema changes would require release notes > > and > > >> break user code > > >> 3. Performance Impact - Using Approach 2 creates direct DB access and > > can > > >> bring back Airflow 2's > > >> connection-per-task overhead > > >> 4. Security Model Violation - Contradicts documented isolation > > principles > > >> > > >> Considering these comments, this is what I want to document now: > > >> > > >> 1. Approach 1 - Keep as primary/recommended solution (aligns with > > Airflow 3 > > >> architecture) > > >> 2. Approach 2 - Present as "known workaround" (not recommendation) > with > > >> explicit warnings > > >> about breaking isolation, schema not being public API, performance > > >> implications, and no support guarantees > > >> 3. Approach 3 - Remove entirely, or keep with strongest possible > > warnings > > >> (would love to hear what others think for > > >> this one particularly) > > >> > > >> Once we arrive at some discussion points on this one, I would like to > > call > > >> for a lazy consensus for posterity and visibility > > >> of the community. > > >> > > >> Looking forward to your feedback! > > >> > > >> [1] > > >> > > >> > > > https://github.com/apache/airflow/blob/main/airflow-core/docs/installation/upgrading_to_airflow3.rst#step-5-review-custom-operators-for-direct-db-access > > >> [2]https://github.com/apache/airflow/pull/57479 > > >> [3]https://github.com/apache/airflow-client-python > > >> >
