wolfier commented on a change in pull request #15183:
URL: https://github.com/apache/airflow/pull/15183#discussion_r617705667
##########
File path: docs/apache-airflow/faq.rst
##########
@@ -159,72 +210,206 @@ simple dictionary.
other_dag_id = f'bar_{i}'
globals()[other_dag_id] = create_dag(other_dag_id)
-What are all the ``airflow tasks run`` commands in my process list?
--------------------------------------------------------------------
+Even though Airflow supports multiple DAG definition per python file,
dynamically generated or else, it is not
+recommended as Airflow would like better isolation between DAGs from a fault
and deployment perspective and multiple
+DAGs in the same file goes against that.
-There are many layers of ``airflow tasks run`` commands, meaning it can call
itself.
-- Basic ``airflow tasks run``: fires up an executor, and tell it to run an
- ``airflow tasks run --local`` command. If using Celery, this means it puts a
- command in the queue for it to run remotely on the worker. If using
- LocalExecutor, that translates into running it in a subprocess pool.
-- Local ``airflow tasks run --local``: starts an ``airflow tasks run --raw``
- command (described below) as a subprocess and is in charge of
- emitting heartbeats, listening for external kill signals
- and ensures some cleanup takes place if the subprocess fails.
-- Raw ``airflow tasks run --raw`` runs the actual operator's execute method and
- performs the actual work.
+Are top level Python code allowed?
+----------------------------------
+While it is not recommended to write any code outside of defining Airflow
constructs, Airflow does support any
+arbitrary python code as long as it does not break the DAG file processor or
prolong file processing time past the
+:ref:`config:core__dagbag_import_timeout` value.
-How can my airflow dag run faster?
-----------------------------------
+A common example is the violation of the time limit when building a dynamic
DAG which usually requires querying data
+from another service like a database. At the same time, the requested service
is being swamped with DAG file
+processors requests for data to process the file. These unintended
interactions may cause the service to deteriorate
+and eventually cause DAG file processing to fail.
-There are a few variables we can control to improve airflow dag performance:
+Refer to :ref:`DAG writing best practices<best_practice:writing_a_dag>` for
more information.
-- ``parallelism``: This variable controls the number of task instances that
runs simultaneously across the whole Airflow cluster. User could increase the
``parallelism`` variable in the ``airflow.cfg``.
-- ``concurrency``: The Airflow scheduler will run no more than ``concurrency``
task instances for your DAG at any given time. Concurrency is defined in your
Airflow DAG. If you do not set the concurrency on your DAG, the scheduler will
use the default value from the ``dag_concurrency`` entry in your
``airflow.cfg``.
-- ``task_concurrency``: This variable controls the number of concurrent
running task instances across ``dag_runs`` per task.
-- ``max_active_runs``: the Airflow scheduler will run no more than
``max_active_runs`` DagRuns of your DAG at a given time. If you do not set the
``max_active_runs`` in your DAG, the scheduler will use the default value from
the ``max_active_runs_per_dag`` entry in your ``airflow.cfg``.
-- ``pool``: This variable controls the number of concurrent running task
instances assigned to the pool.
-How can we reduce the airflow UI page load time?
-------------------------------------------------
+Do Macros resolves in another Jinja template?
+---------------------------------------------
-If your dag takes long time to load, you could reduce the value of
``default_dag_run_display_number`` configuration in ``airflow.cfg`` to a
smaller value. This configurable controls the number of dag run to show in UI
with default value 25.
+It is not possible to render :ref:`Macros<macros>` or any Jinja template
within another Jinja template. This is
+commonly attempted in ``user_defined_macros``.
+.. code-block:: python
-How to fix Exception: Global variable explicit_defaults_for_timestamp needs to
be on (1)?
------------------------------------------------------------------------------------------
+ dag = DAG(
+ ...
+ user_defined_macros={
+ 'my_custom_macro': 'day={{ ds }}'
+ }
+ )
-This means ``explicit_defaults_for_timestamp`` is disabled in your mysql
server and you need to enable it by:
+ bo = BashOperator(
+ task_id='my_task',
+ bash_command="echo {{ my_custom_macro }}",
+ dag=dag
+ )
-#. Set ``explicit_defaults_for_timestamp = 1`` under the ``mysqld`` section in
your ``my.cnf`` file.
-#. Restart the Mysql server.
+This will echo "day={{ ds }}" instead of "day=2020-01-01" for a dagrun with
the execution date 2020-01-01 00:00:00.
+.. code-block:: python
-How to reduce airflow dag scheduling latency in production?
------------------------------------------------------------
+ bo = BashOperator(
+ task_id='my_task',
+ bash_command="echo day={{ ds }}",
+ dag=dag
+ )
+
+By using the ds macros directly in the template_field, the rendered value
results in "day=2020-01-01".
-Airflow 2 has low DAG scheduling latency out of the box (particularly when
compared with Airflow 1.10.x),
-however if you need more throughput you can :ref:`start multiple
schedulers<scheduler:ha>`.
-Why next_ds or prev_ds might not contain expected values?
----------------------------------------------------------
+Why ``next_ds`` or ``prev_ds`` might not contain expected values?
+------------------------------------------------------------------
- When scheduling DAG, the ``next_ds`` ``next_ds_nodash`` ``prev_ds``
``prev_ds_nodash`` are calculated using
``execution_date`` and ``schedule_interval``. If you set
``schedule_interval`` as ``None`` or ``@once``,
the ``next_ds``, ``next_ds_nodash``, ``prev_ds``, ``prev_ds_nodash`` values
will be set to ``None``.
- When manually triggering DAG, the schedule will be ignored, and ``prev_ds ==
next_ds == ds``
+
+Task execution interactions
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Why does a sensor never complete?
+---------------------------------
+
+When waiting on another task's result, it is recommended to raise the priority
of the tasks it will be waiting for.
+Otherwise, scheduling can get deadlocked, with the waiting task taking up all
available slots, leaving no slots for the
Review comment:
I will remove this faq for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]