wolfier commented on a change in pull request #15183:
URL: https://github.com/apache/airflow/pull/15183#discussion_r624533304
##########
File path: docs/apache-airflow/faq.rst
##########
@@ -159,72 +215,194 @@ simple dictionary.
other_dag_id = f'bar_{i}'
globals()[other_dag_id] = create_dag(other_dag_id)
-What are all the ``airflow tasks run`` commands in my process list?
--------------------------------------------------------------------
+Even though Airflow supports multiple DAG definition per python file,
dynamically generated or else, it is not
+recommended as Airflow would like better isolation between DAGs from a fault
and deployment perspective and multiple
+DAGs in the same file goes against that.
-There are many layers of ``airflow tasks run`` commands, meaning it can call
itself.
-- Basic ``airflow tasks run``: fires up an executor, and tell it to run an
- ``airflow tasks run --local`` command. If using Celery, this means it puts a
- command in the queue for it to run remotely on the worker. If using
- LocalExecutor, that translates into running it in a subprocess pool.
-- Local ``airflow tasks run --local``: starts an ``airflow tasks run --raw``
- command (described below) as a subprocess and is in charge of
- emitting heartbeats, listening for external kill signals
- and ensures some cleanup takes place if the subprocess fails.
-- Raw ``airflow tasks run --raw`` runs the actual operator's execute method and
- performs the actual work.
+Are top level Python code allowed?
+----------------------------------
+While it is not recommended to write any code outside of defining Airflow
constructs, Airflow does support any
+arbitrary python code as long as it does not break the DAG file processor or
prolong file processing time past the
+:ref:`config:core__dagbag_import_timeout` value.
-How can my airflow dag run faster?
-----------------------------------
+A common example is the violation of the time limit when building a dynamic
DAG which usually requires querying data
+from another service like a database. At the same time, the requested service
is being swamped with DAG file
+processors requests for data to process the file. These unintended
interactions may cause the service to deteriorate
+and eventually cause DAG file processing to fail.
-There are a few variables we can control to improve airflow dag performance:
+Refer to :ref:`DAG writing best practices<best_practice:writing_a_dag>` for
more information.
-- ``parallelism``: This variable controls the number of task instances that
runs simultaneously across the whole Airflow cluster. User could increase the
``parallelism`` variable in the ``airflow.cfg``.
-- ``concurrency``: The Airflow scheduler will run no more than ``concurrency``
task instances for your DAG at any given time. Concurrency is defined in your
Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will
use the default value from the ``dag_concurrency`` entry in your
``airflow.cfg``.
-- ``task_concurrency``: This variable controls the number of concurrent
running task instances across ``dag_runs`` per task.
-- ``max_active_runs``: the Airflow scheduler will run no more than
``max_active_runs`` DagRuns of your DAG at a given time. If you do not set the
``max_active_runs`` in your DAG, the scheduler will use the default value from
the ``max_active_runs_per_dag`` entry in your ``airflow.cfg``.
-- ``pool``: This variable controls the number of concurrent running task
instances assigned to the pool.
-How can we reduce the airflow UI page load time?
-------------------------------------------------
+Do Macros resolves in another Jinja template?
+---------------------------------------------
-If your dag takes long time to load, you could reduce the value of
``default_dag_run_display_number`` configuration in ``airflow.cfg`` to a
smaller value. This configurable controls the number of dag run to show in UI
with default value 25.
+It is not possible to render :ref:`Macros<macros>` or any Jinja template
within another Jinja template. This is
+commonly attempted in ``user_defined_macros``.
+.. code-block:: python
-How to fix Exception: Global variable explicit_defaults_for_timestamp needs to
be on (1)?
------------------------------------------------------------------------------------------
+ dag = DAG(
+ ...
+ user_defined_macros={
+ 'my_custom_macro': 'day={{ ds }}'
+ }
+ )
-This means ``explicit_defaults_for_timestamp`` is disabled in your mysql
server and you need to enable it by:
+ bo = BashOperator(
+ task_id='my_task',
+ bash_command="echo {{ my_custom_macro }}",
+ dag=dag
+ )
-#. Set ``explicit_defaults_for_timestamp = 1`` under the ``mysqld`` section in
your ``my.cnf`` file.
-#. Restart the Mysql server.
+This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with
the execution date 2020-01-01 00:00:00.
+.. code-block:: python
-How to reduce airflow dag scheduling latency in production?
------------------------------------------------------------
+ bo = BashOperator(
+ task_id='my_task',
+ bash_command="echo day={{ ds }}",
+ dag=dag
+ )
-Airflow 2 has low DAG scheduling latency out of the box (particularly when
compared with Airflow 1.10.x),
-however if you need more throughput you can :ref:`start multiple
schedulers<scheduler:ha>`.
+By using the ds macros directly in the template_field, the rendered value
results in "day=2020-01-01".
-Why next_ds or prev_ds might not contain expected values?
----------------------------------------------------------
+
+Why ``next_ds`` or ``prev_ds`` might not contain expected values?
+------------------------------------------------------------------
- When scheduling DAG, the ``next_ds`` ``next_ds_nodash`` ``prev_ds``
``prev_ds_nodash`` are calculated using
``execution_date`` and ``schedule_interval``. If you set
``schedule_interval`` as ``None`` or ``@once``,
the ``next_ds``, ``next_ds_nodash``, ``prev_ds``, ``prev_ds_nodash`` values
will be set to ``None``.
- When manually triggering DAG, the schedule will be ignored, and ``prev_ds ==
next_ds == ds``
+
+Task execution interactions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+What does TemplateNotFound mean?
+---------------------------------
+
+TemplateNotFound errors are usually due to misalignment with user expectations
when passing path to operator
+that trigger Jinja templating. A common occurrence is with
:ref:`BashOperators<howto/operator:BashOperator>`.
+
+Another commonly missed fact is that the files are resolved relative to where
the pipeline file lives. You can add
+other directories to the ``template_searchpath`` of the DAG object to allow
for other non-relative location.
+
+
+How to trigger tasks based on another task's failure?
+-----------------------------------------------------
+
+For tasks that are related through dependency, you can set the
``trigger_rule`` to ``TriggerRule.ALL_FAILED`` if the
+task execution depends on the failure of ALL its upstream tasks or
``TriggerRule.ONE_FAILED`` for just one of the
+upstream task.
+
+.. code-block:: python
+
+ from airflow.models import DAG
+ from airflow.operators.python import PythonOperator
+ from airflow.utils.trigger_rule import TriggerRule
+
+ parent = PythonOperator(
+ task_id='a',
+ python_callable=lambda x: 1,
+ dag=dag
+ )
+
+ child = PythonOperator(
+ task_id='b',
+ python_callable=lambda: 1,
+ trigger_rule=TriggerRule.ALL_FAILED,
+ dag=dag
+ )
+
+ parent >> child
Review comment:
Sounds good! Will convert to task decorator.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]