ferruzzi commented on code in PR #34324:
URL: https://github.com/apache/airflow/pull/34324#discussion_r1339321927
##########
docs/apache-airflow/core-concepts/executor/index.rst:
##########
@@ -78,3 +71,134 @@ There are two types of executor - those that run tasks
*locally* (inside the ``s
.. note::
New Airflow users may assume they need to run a separate executor process
using one of the Local or Remote Executors. This is not correct. The executor
logic runs *inside* the scheduler process, and will run the tasks locally or
not depending the executor selected.
+
+Writing Your Own Executor
+-------------------------
+
+All Airflow executors implement a common interface so that they are pluggable
and any executor has access to all abilities and integrations within Airflow.
Primarily, the Airflow scheduler uses this interface to interact with the
executor, but other components such as logging, CLI and backfill do as well.
+The public interface is the
:class:`~airflow.executors.base_executor.BaseExecutor`. You can look through
the code for the most detailed and up to date interface, but some important
highlights are outlined below:
+
+.. note::
+ For more information about Airflow's public interface see
:doc:`/public-airflow-interface`.
+
+Some reasons you may want to write a custom executor include:
+
+* An executor does not exist which fits your specific use case, such as a
specific tool or service for compute.
+* You'd like to use an executor that leverages a compute service but from your
preferred cloud provider.
+* You have a private tool/service for task execution that is only available to
you or your organization.
+
+
+Important BaseExecutor Methods
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+These methods don't require overriding to implement your own executor, but are
useful to be aware of:
+
+* ``heartbeat``: The Airflow scheduler Job loop will periodically call
heartbeat on the executor. This is one of the main points of interaction
between the Airflow scheduler and the executor. This method updates some
metrics, triggers newly queued tasks to execute and updates state of
running/completed tasks.
+* ``queue_command``: The Airflow Executor will call this method of the
BaseExecutor to provide tasks to be run by the executor. The BaseExecutor
simply adds the TaskInstances to an internal list of queued tasks within the
executor.
+* ``get_event_buffer``: The Airflow scheduler calls this method to retrieve
the current state of the TaskInstances the executor is executing.
+* ``has_task``: The scheduler uses this BaseExecutor method to determine if an
executor already has a specific task instance queued or running.
+* ``send_callback``: Sends any callbacks to the sink configured on the
executor.
+
+
+Mandatory Methods to Implement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following methods must be overridden at minimum to have your executor
supported by Airflow:
+
+* ``sync``: Sync will get called periodically during executor heartbeats.
Implement this method to update the state of the tasks which the executor knows
about. Optionally, attempting to execute queued tasks that have been received
from the scheduler.
+* ``execute_async``: executes a command asynchronously. A command in this
context is an Airflow CLI command to run an Airflow task. This method is called
(after a few layers) during executor heartbeat which is run periodically by the
scheduler. In practice, this method often just enqueues tasks into an internal
or external queue of tasks to be run (e.g. ``KubernetesExecutor``). But can
also execute the tasks directly as well (e.g. ``LocalExecutor``). This will
depend on the executor.
+
+
+Optional Interface Methods to Implement
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following methods aren't required to override to have a functional Airflow
executor. However, some powerful capabilities and stability can come from
implementing them:
+
+* ``start``: The Airflow scheduler (and backfill) job will call this method
after it initializes the executor object. Any additional setup required by the
executor can be completed here.
+* end: The Airflow scheduler (and backfill) job will call this method as it is
tearing down. Any synchronous cleanup required to finish running jobs should be
done here.
+* ``terminate``: More forcefully stop the executor, even killing/stopping in
flight tasks instead of synchronously waiting for completion.
+* ``cleanup_stuck_queued_tasks``: If tasks are stuck in the queued state for
longer than ``task_queued_timeout`` then they are collected by the scheduler
and provided to the executor to have an opportunity to handle them (perform any
graceful cleanup/teardown) via this method and return the Task Instances for a
warning message displayed to users.
+* ``try_adopt_task_instances``: Tasks that have been abandoned (e.g. from a
scheduler job that died) are provided to the executor to adopt or otherwise
handle them via this method. Any tasks that cannot be adopted (by default the
BaseExector assumes all cannot be adopted) should be returned.
+* ``get_cli_commands``: Executors may vend CLI commands to users by
implementing this method, see the `CLI`_ section below for more details.
+* ``get_task_log``: Executors may vend log messages to Airflow task logs by
implementing this method, see the `Logging`_ section below for more details.
+
+Compatibility Attributes
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``BaseExecutor`` class interface contains a set of attributes that Airflow
core code uses to check the features that your executor is compatible with.
When writing your own Airflow executor be sure to set these correctly for your
use case. Each attribute is simply a boolean to enable/disable a feature or
indicate that a feature is supported/unsupported by the executor:
+
+* ``supports_pickling``: Whether or not the executor supports reading pickled
DAGs from the Database before execution (rather than reading the DAG definition
from the file system).
+* ``supports_sentry``: Whether or not the executor supports `Sentry
<https://sentry.io>`_.
+
+* ``is_local``: Whether or not the executor is remote or local. See the
`Executor Types`_ section above.
+* ``is_single_threaded``: Whether or not the executor is single threaded. This
is particularly relevant to what database backends are supported. Single
threaded executors can run with any backend, including SQLite.
+* ``is_production``: Whether or not the executor should be used for production
purposes. A UI message is displayed to users when they are using a
non-production ready executor.
+
+* ``change_sensor_mode_to_reschedule``: Running Airflow sensors in poke mode
can block the thread of executors and in some cases Airflow.
+* ``serve_logs``: Whether or not the executor supports serving logs, see
:doc:`/administration-and-deployment/logging-monitoring/logging-tasks`.
+
+CLI
+^^^
+
+Executors may vend CLI commands which will be included in the ``airflow``
command line tool by implementing the ``get_cli_commands`` method. Executors
such as ``CeleryExecutor`` and ``KubernetesExecutor`` for example, make use of
this mechanism. The commands can be used to setup required workers, initialize
environment or set other configuration. Commands are only vended for the
currently configured executor. A pseudo-code example of implementing CLI
command vending from an executor can be seen below:
+
+.. code-block:: python
+
+ @staticmethod
+ def get_cli_commands() -> list[GroupCommand]:
+ sub_commands = [
+ ActionCommand(
+ name="command_name",
+ help="Description of what this specific command does",
+ func=lazy_load_command("path.to.python.function.for.command"),
+ args=(),
+ ),
+ ]
+
+ return [
+ GroupCommand(
+ name="my_cool_executor",
+ help="Description of what this group of commands do",
+ subcommands=sub_commands,
+ ),
+ ]
+
+.. note::
+ There are no strict rules in place, currently, for the Airflow command
namespace. It is up to developers to use names for their CLI commands that are
sufficiently unique so as to not cause conflicts with other Airflow executors
or components.
Review Comment:
As a comma-abuser, I feel bad about this suggestion, but I do think it flows
a bit better.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]