amoghrajesh commented on code in PR #52297: URL: https://github.com/apache/airflow/pull/52297#discussion_r2181854222
########## airflow-core/docs/core-concepts/xcoms.rst: ########## @@ -93,6 +94,9 @@ The XCom system has interchangeable backends, and you can set which backend is b If you want to implement your own backend, you should subclass :class:`~airflow.models.xcom.BaseXCom`, and override the ``serialize_value`` and ``deserialize_value`` methods. Review Comment: ```suggestion If you want to implement your own backend, you should subclass :class:`~airflow.sdk.bases.xcom.BaseXCom`, and override the ``serialize_value`` and ``deserialize_value`` methods. ``` ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -25,6 +36,13 @@ and extending Airflow capabilities by writing new executors, plugins, operators Public Interface can be useful for building custom tools and integrations with other systems, and for automating certain aspects of the Airflow workflow. +The primary public interface for DAG Authors and task execution is using task SDK Review Comment: ```suggestion The primary public interface for DAG Authors and task execution is using task SDK Airflow task SDK is the primary public interface for DAG Authors and for task execution ``` ########## airflow-core/docs/core-concepts/xcoms.rst: ########## @@ -25,6 +25,9 @@ XComs (short for "cross-communications") are a mechanism that let :doc:`tasks` t An XCom is identified by a ``key`` (essentially its name), as well as the ``task_id`` and ``dag_id`` it came from. They can have any serializable value (including objects that are decorated with ``@dataclass`` or ``@attr.define``, see :ref:`TaskFlow arguments <concepts:arbitrary-arguments>`:), but they are only designed for small amounts of data; do not use them to pass around large values, like dataframes. +XCom operations should be performed through the Task Context using +:func:`~airflow.sdk.get_current_context`. Direct database access is not possible. Review Comment: ```suggestion XCom operations should be performed through the Task Context using :func:`~airflow.sdk.get_current_context`. Directly updating using XCom database model is not possible. ``` ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -77,64 +153,86 @@ You can read more about dags in :doc:`Dags <core-concepts/dags>`. References for the modules used in dags are here: -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 - - _api/airflow/models/dag/index - _api/airflow/models/dagbag/index +.. note:: + The airflow.sdk namespace provides the primary interface for DAG Authors. + For detailed API documentation, see the `Task SDK Reference <https://airflow.apache.org/docs/task-sdk/stable/>`_. -Properties of a :class:`~airflow.models.dagrun.DagRun` can also be referenced in things like :ref:`Templates <templates-ref>`. - -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 +.. note:: + The :class:`~airflow.models.dagbag.DagBag` class is used internally by Airflow for loading DAGs + from files and folders. DAG Authors should use the :class:`~airflow.sdk.DAG` class from the + airflow.sdk namespace instead. - _api/airflow/models/dagrun/index +.. note:: + The :class:`~airflow.models.dagrun.DagRun` class is used internally by Airflow for DAG run + management. DAG Authors should access DAG run information through the Task Context via + :func:`~airflow.sdk.get_current_context` or use the :class:`~airflow.sdk.types.DagRunProtocol` + interface. .. _pythonapi:operators: Operators ---------- +========= -The base classes :class:`~airflow.models.baseoperator.BaseOperator` and :class:`~airflow.sensors.base.BaseSensorOperator` are public and may be extended to make new operators. +The base classes :class:`~airflow.sdk.BaseOperator` and :class:`~airflow.sdk.BaseSensorOperator` are public and may be extended to make new operators. + +The recommended base class for new operators is :class:`~airflow.sdk.BaseOperator` +from the airflow.sdk namespace. Subclasses of BaseOperator which are published in Apache Airflow are public in *behavior* but not in *structure*. That is to say, the Operator's parameters and behavior is governed by semver but the methods are subject to change at any time. Task Instances --------------- +============== Task instances are the individual runs of a single task in a DAG (in a DAG Run). They are available in the context -passed to the execute method of the operators via the :class:`~airflow.models.taskinstance.TaskInstance` class. +passed to the execute method of the operators via the :class:`~airflow.sdk.types.RuntimeTaskInstanceProtocol` class. -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 - - _api/airflow/models/taskinstance/index +Task instances are accessed through the Task Context via :func:`~airflow.sdk.get_current_context` +Direct database access is not possible. The :class:`~airflow.sdk.types.RuntimeTaskInstanceProtocol` provides +the stable interface for task instance operations. +.. note:: + Task Context and RuntimeTaskInstanceProtocol are part of the airflow.sdk namespace. + For detailed API documentation, see the `Task SDK Reference <https://airflow.apache.org/docs/task-sdk/stable/>`_. Task Instance Keys ------------------- +================== Task instance keys are unique identifiers of task instances in a DAG (in a DAG Run). A key is a tuple that consists of -``dag_id``, ``task_id``, ``run_id``, ``try_number``, and ``map_index``. The key of a task instance can be retrieved via -:meth:`~airflow.models.taskinstance.TaskInstance.key`. +``dag_id``, ``task_id``, ``run_id``, ``try_number``, and ``map_index``. -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 +Direct access to task instance keys via the :class:`~airflow.models.taskinstance.TaskInstance` +model is no longer allowed from task code. Instead, use the Task Context via :func:`~airflow.sdk.get_current_context` +to access task instance information. + +Example of accessing task instance information through Task Context: + +.. code-block:: python + + from airflow.sdk import get_current_context + + + def my_task(): + context = get_current_context() + ti = context["ti"] + + dag_id = ti.dag_id + task_id = ti.task_id + run_id = ti.run_id + try_number = ti.try_number + map_index = ti.map_index + + print(f"Task: {dag_id}.{task_id}, Run: {run_id}, Try: {try_number}, Map: {map_index}") Review Comment: ```suggestion print(f"Task: {dag_id}.{task_id}, Run: {run_id}, Try: {try_number}, Map Index: {map_index}") ``` ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -77,64 +153,86 @@ You can read more about dags in :doc:`Dags <core-concepts/dags>`. References for the modules used in dags are here: -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 - - _api/airflow/models/dag/index - _api/airflow/models/dagbag/index +.. note:: + The airflow.sdk namespace provides the primary interface for DAG Authors. + For detailed API documentation, see the `Task SDK Reference <https://airflow.apache.org/docs/task-sdk/stable/>`_. -Properties of a :class:`~airflow.models.dagrun.DagRun` can also be referenced in things like :ref:`Templates <templates-ref>`. - -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 +.. note:: + The :class:`~airflow.models.dagbag.DagBag` class is used internally by Airflow for loading DAGs + from files and folders. DAG Authors should use the :class:`~airflow.sdk.DAG` class from the + airflow.sdk namespace instead. - _api/airflow/models/dagrun/index +.. note:: + The :class:`~airflow.models.dagrun.DagRun` class is used internally by Airflow for DAG run + management. DAG Authors should access DAG run information through the Task Context via + :func:`~airflow.sdk.get_current_context` or use the :class:`~airflow.sdk.types.DagRunProtocol` + interface. .. _pythonapi:operators: Operators ---------- +========= -The base classes :class:`~airflow.models.baseoperator.BaseOperator` and :class:`~airflow.sensors.base.BaseSensorOperator` are public and may be extended to make new operators. +The base classes :class:`~airflow.sdk.BaseOperator` and :class:`~airflow.sdk.BaseSensorOperator` are public and may be extended to make new operators. + +The recommended base class for new operators is :class:`~airflow.sdk.BaseOperator` +from the airflow.sdk namespace. Review Comment: Makes it sound like there's another option, maybe get rid of this bit? ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -77,64 +153,86 @@ You can read more about dags in :doc:`Dags <core-concepts/dags>`. References for the modules used in dags are here: -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 - - _api/airflow/models/dag/index - _api/airflow/models/dagbag/index +.. note:: + The airflow.sdk namespace provides the primary interface for DAG Authors. + For detailed API documentation, see the `Task SDK Reference <https://airflow.apache.org/docs/task-sdk/stable/>`_. -Properties of a :class:`~airflow.models.dagrun.DagRun` can also be referenced in things like :ref:`Templates <templates-ref>`. - -.. toctree:: - :includehidden: - :glob: - :maxdepth: 1 +.. note:: + The :class:`~airflow.models.dagbag.DagBag` class is used internally by Airflow for loading DAGs + from files and folders. DAG Authors should use the :class:`~airflow.sdk.DAG` class from the + airflow.sdk namespace instead. - _api/airflow/models/dagrun/index +.. note:: + The :class:`~airflow.models.dagrun.DagRun` class is used internally by Airflow for DAG run + management. DAG Authors should access DAG run information through the Task Context via + :func:`~airflow.sdk.get_current_context` or use the :class:`~airflow.sdk.types.DagRunProtocol` + interface. .. _pythonapi:operators: Operators ---------- +========= -The base classes :class:`~airflow.models.baseoperator.BaseOperator` and :class:`~airflow.sensors.base.BaseSensorOperator` are public and may be extended to make new operators. +The base classes :class:`~airflow.sdk.BaseOperator` and :class:`~airflow.sdk.BaseSensorOperator` are public and may be extended to make new operators. + +The recommended base class for new operators is :class:`~airflow.sdk.BaseOperator` +from the airflow.sdk namespace. Subclasses of BaseOperator which are published in Apache Airflow are public in *behavior* but not in *structure*. That is to say, the Operator's parameters and behavior is governed by semver but the methods are subject to change at any time. Task Instances --------------- +============== Task instances are the individual runs of a single task in a DAG (in a DAG Run). They are available in the context -passed to the execute method of the operators via the :class:`~airflow.models.taskinstance.TaskInstance` class. +passed to the execute method of the operators via the :class:`~airflow.sdk.types.RuntimeTaskInstanceProtocol` class. Review Comment: Hmmm, do we want to expose the `RuntimeTaskInstanceProtocol` in docs? cc @kaxil ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -15,6 +15,17 @@ specific language governing permissions and limitations under the License. +**PUBLIC INTERFACE FOR AIRFLOW 3.0+** +===================================== + +.. warning:: + + **This documentation covers the Public Interface for Airflow 3.0+** + + If you are using Airflow 2.x, please refer to the + `Airflow 2.11 Public Interface Documentation <https://airflow.apache.org/docs/apache-airflow/2.11.0/public-airflow-interface.html>`_ + for the legacy interface. Review Comment: cc @potiuk is this in line with what you expected? ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -151,33 +249,58 @@ by extending them: _api/airflow/hooks/index Public Airflow utilities ------------------------- +======================== -When writing or extending Hooks and Operators, DAG authors and developers can +When writing or extending Hooks and Operators, DAG Authors and developers can use the following classes: -* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration. -* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables. -* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data. +* The :class:`~airflow.sdk.Connection`, which provides access to external service credentials and configuration. +* The :class:`~airflow.sdk.Variable`, which provides access to Airflow configuration variables. +* The :class:`~airflow.sdk.execution_time.xcom.XCom` which are used to access to inter-task communication data. + +Connection and Variable operations should be performed through the Task Context using +:func:`~airflow.sdk.get_current_context` and the task instance's methods, or through the airflow.sdk namespace. +Direct database access to :class:`~airflow.models.connection.Connection` and :class:`~airflow.models.variable.Variable` +models is no longer allowed from task code. + +Example of accessing connections and variables through Task Context: + +.. code-block:: python + + from airflow.sdk import get_current_context + + + def my_task(): + context = get_current_context() + + conn = context["conn"] + my_connection = conn.get_connection("my_connection_id") + + var = context["var"] + my_variable = var.get("my_variable_name") + +Example of using airflow.sdk namespace directly: + +.. code-block:: python + + from airflow.sdk import Connection, Variable + + conn = Connection.get_connection("my_connection_id") Review Comment: ```suggestion conn = Connection.get("my_connection_id") ``` This isn't right. its ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -417,3 +541,55 @@ but in Airflow they are not parts of the Public Interface and might change any t * Python classes except those explicitly mentioned in this document, are considered an internal implementation detail and you should not assume they will be maintained in a backwards-compatible way. + +**Direct metadata database access from task code is no longer allowed**. +Task code cannot directly access the metadata database to query DAG state, task history, +or DAG runs. Instead, use one of the following alternatives: + +* **Task Context**: Use :func:`~airflow.sdk.get_current_context` to access task instance + information and methods like :meth:`~airflow.sdk.types.RuntimeTaskInstanceProtocol.get_dr_count`, + :meth:`~airflow.sdk.types.RuntimeTaskInstanceProtocol.get_dagrun_state`, and + :meth:`~airflow.sdk.types.RuntimeTaskInstanceProtocol.get_task_states`. + +* **REST API**: Use the :doc:`Stable REST API <stable-rest-api-ref>` for programmatic + access to Airflow metadata. + +* **Python Client**: Use the `Python Client <https://github.com/apache/airflow-client-python>`_ for Python-based + interactions with Airflow. + +This change improves architectural separation and enables remote execution capabilities. + +Example of using Task Context instead of direct database access: + +.. code-block:: python + + from airflow.sdk import dag, get_current_context, task + from airflow.utils.state import DagRunState + from datetime import datetime + + + @dag(dag_id="example_dag", start_date=datetime(2025, 1, 1), schedule="@hourly", tags=["misc"], catchup=False) + def example_dag(): + + @task(task_id="check_dagrun_state") + def check_state(): + context = get_current_context() + ti = context["ti"] + dag_run = context["dag_run"] + + # Use Task Context methods instead of direct DB access + dr_count = ti.get_dr_count(dag_id="example_dag") + dagrun_state = ti.get_dagrun_state(dag_id="example_dag", run_id=dag_run.run_id) + + return f"DAG run count: {dr_count}, current state: {dagrun_state}" + + check_state() + + + example_dag() + +.. note:: + + **For Airflow 2.x users**: If you are using Airflow 2.x, please refer to the + `Airflow 2.11 Public Interface Documentation <https://airflow.apache.org/docs/apache-airflow/2.11.0/public-airflow-interface.html>`_ + for the legacy interface. Review Comment: Is this needed again? Feels repetitive ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -151,33 +249,58 @@ by extending them: _api/airflow/hooks/index Public Airflow utilities ------------------------- +======================== -When writing or extending Hooks and Operators, DAG authors and developers can +When writing or extending Hooks and Operators, DAG Authors and developers can use the following classes: -* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration. -* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables. -* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data. +* The :class:`~airflow.sdk.Connection`, which provides access to external service credentials and configuration. +* The :class:`~airflow.sdk.Variable`, which provides access to Airflow configuration variables. +* The :class:`~airflow.sdk.execution_time.xcom.XCom` which are used to access to inter-task communication data. + +Connection and Variable operations should be performed through the Task Context using +:func:`~airflow.sdk.get_current_context` and the task instance's methods, or through the airflow.sdk namespace. +Direct database access to :class:`~airflow.models.connection.Connection` and :class:`~airflow.models.variable.Variable` +models is no longer allowed from task code. + +Example of accessing connections and variables through Task Context: Review Comment: ```suggestion Example of accessing Connections and Variables through Task Context: ``` ########## airflow-core/docs/public-airflow-interface.rst: ########## @@ -151,33 +249,58 @@ by extending them: _api/airflow/hooks/index Public Airflow utilities ------------------------- +======================== -When writing or extending Hooks and Operators, DAG authors and developers can +When writing or extending Hooks and Operators, DAG Authors and developers can use the following classes: -* The :class:`~airflow.models.connection.Connection`, which provides access to external service credentials and configuration. -* The :class:`~airflow.models.variable.Variable`, which provides access to Airflow configuration variables. -* The :class:`~airflow.models.xcom.XCom` which are used to access to inter-task communication data. +* The :class:`~airflow.sdk.Connection`, which provides access to external service credentials and configuration. +* The :class:`~airflow.sdk.Variable`, which provides access to Airflow configuration variables. +* The :class:`~airflow.sdk.execution_time.xcom.XCom` which are used to access to inter-task communication data. + +Connection and Variable operations should be performed through the Task Context using +:func:`~airflow.sdk.get_current_context` and the task instance's methods, or through the airflow.sdk namespace. +Direct database access to :class:`~airflow.models.connection.Connection` and :class:`~airflow.models.variable.Variable` +models is no longer allowed from task code. + +Example of accessing connections and variables through Task Context: + +.. code-block:: python + + from airflow.sdk import get_current_context + + + def my_task(): + context = get_current_context() + + conn = context["conn"] + my_connection = conn.get_connection("my_connection_id") + + var = context["var"] + my_variable = var.get("my_variable_name") Review Comment: Have you validated this? The connection certainly doesn't look right.. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
