This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch v3-1-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit 930dce3fc8d59a81ad9f1a1a680a9cdae367c3c1
Author: Amogh Desai <[email protected]>
AuthorDate: Tue Nov 18 02:27:33 2025 +0530

    Add migration options for Airflow 2 users using database access in tasks 
(#57479)
    
    * Add migration options for Airflow 2 users using database access in tasks
    
    * add link to python client
    
    * handling comments from phani
    
    * suggestions from rahul
    
    * Apply suggestion from @rawwar
    
    Co-authored-by: Kalyan R <[email protected]>
    
    * Apply suggestion from @rawwar
    
    Co-authored-by: Kalyan R <[email protected]>
    
    * adding pro and con for each
    
    * fixing underline
    
    * update things
    
    * fixing static checks
    
    * Update after lazy consensus
    
    ---------
    
    Co-authored-by: Kalyan R <[email protected]>
    (cherry picked from commit 10d55caaea0316961dfc3e476de6d999c2168bd7)
---
 .../docs/installation/upgrading_to_airflow3.rst    | 95 ++++++++++++++++++++--
 1 file changed, 90 insertions(+), 5 deletions(-)

diff --git a/airflow-core/docs/installation/upgrading_to_airflow3.rst 
b/airflow-core/docs/installation/upgrading_to_airflow3.rst
index 0aaf0a51c02..4f7f8d5ce56 100644
--- a/airflow-core/docs/installation/upgrading_to_airflow3.rst
+++ b/airflow-core/docs/installation/upgrading_to_airflow3.rst
@@ -192,12 +192,97 @@ Step 4: Install the Standard Provider
 - For convenience, this package can also be installed on Airflow 2.x versions, 
so that Dags can be modified to reference these Operators from the standard 
provider
   package instead of Airflow Core.
 
-Step 5: Review custom operators for direct db access
-----------------------------------------------------
+Step 5: Review custom written tasks for direct DB access
+--------------------------------------------------------
 
-- In Airflow 3 operators can not access the Airflow metadata database directly 
using database sessions.
-  If you have custom operators, review the code to make sure there are no 
direct db access.
-  You can follow examples in https://github.com/apache/airflow/issues/49187 to 
find how to modify your code if needed.
+In Airflow 3, operators cannot access the Airflow metadata database directly 
using database sessions.
+If you have custom operators, review your code to ensure there are no direct 
database access calls.
+You can follow examples in https://github.com/apache/airflow/issues/49187 to 
learn how to modify your code if needed.
+
+If you have custom operators or task code that previously accessed the 
metadata database directly, you must migrate to one of the following approaches:
+
+Recommended Approach: Use Airflow Python Client
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Use the official `Airflow Python Client 
<https://github.com/apache/airflow-client-python>`_ to interact with
+Airflow metadata database via REST API. The Python Client has APIs defined for 
most use cases, including DagRuns,
+TaskInstances, Variables, Connections, XComs, and more.
+
+**Pros:**
+- No direct database network access required from workers
+- Most aligned with Airflow 3's API-first architecture
+- No database credentials needed in worker environment (uses API tokens)
+- Workers don't need database drivers installed
+- Centralized access control and authentication via API server
+
+**Cons:**
+- Requires installing ``apache-airflow-client`` package
+- Requires acquisition of access tokens by performing API call to 
``/auth/token`` and rotating them as needed
+- Requires API server availability and network access to API server
+- Not all database operations may be exposed via API endpoints
+
+.. note::
+   If you need functionality that is not available via the Airflow Python 
Client, consider requesting new API endpoints or Task SDK features. The Airflow 
community prioritizes adding missing API capabilities over enabling direct 
database access.
+
+Known Workaround: Use DbApiHook (PostgresHook or MySqlHook)
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+.. warning::
+   This approach is **NOT recommended** and is documented only as a known 
workaround for users who cannot use the Airflow Python Client. This approach 
has significant limitations and **will** break in future Airflow versions.
+
+   **Important considerations:**
+
+   - **Will break in future versions**: This approach will break in Airflow 
3.2+ and beyond. You are responsible for adapting your code when schema changes 
occur.
+   - **Database schema is NOT a public API**: The Airflow metadata database 
schema can change at any time without notice. Schema changes will break your 
queries without warning.
+   - **Breaks task isolation**: This contradicts one of Airflow 3's core 
features - task isolation. Tasks should not directly access the metadata 
database.
+   - **Performance implications**: This reintroduces Airflow 2 behavior where 
each task opens separate database connections, dramatically changing 
performance characteristics and scalability.
+
+If your use case cannot be addressed using the Python Client and you 
understand the risks above, you ma use database hooks to query your metadata 
database directly. Create a database
+connection (PostgreSQL or MySQL, matching your metadata database type) 
pointing to your metadata database
+and use Database Hooks in Airflow.
+
+**Note:** These hooks connect directly to the database (not via
+the API server) using database drivers like psycopg2 or mysqlclient.
+
+**Example using PostgresHook (MySql has similar interface too)**
+
+.. code-block:: python
+
+   from airflow.sdk import task
+   from airflow.providers.postgres.hooks.postgres import PostgresHook
+
+
+   @task
+   def get_connections_from_db():
+       hook = PostgresHook(postgres_conn_id="metadata_postgres")
+       records = hook.get_records(
+           sql="""
+           SELECT conn_id, conn_type, host, schema, login
+           FROM connection
+           WHERE conn_type = 'postgres'
+           LIMIT 10;
+           """
+       )
+
+       return records
+
+**Example using SQLExecuteQueryOperator**
+
+You can also use ``SQLExecuteQueryOperator`` if you prefer to use operators 
instead of hooks:
+
+.. code-block:: python
+
+   from airflow.providers.common.sql.operators.sql import 
SQLExecuteQueryOperator
+
+   query_task = SQLExecuteQueryOperator(
+       task_id="query_metadata",
+       conn_id="metadata_postgres",
+       sql="SELECT conn_id, conn_type FROM connection WHERE conn_type = 
'postgres'",
+       do_xcom_push=True,
+   )
+
+.. note::
+   Always use **read-only database credentials** for metadata database 
connections and it is recommended to use temporary credentials.
 
 Step 6: Deployment Managers - Upgrade your Airflow Instance
 ------------------------------------------------------------

Reply via email to