This is an automated email from the ASF dual-hosted git repository.

ephraimanierobi pushed a commit to branch v2-7-test
in repository https://gitbox.apache.org/repos/asf/airflow.git

commit b83fa2eba9781dd071f9b78043530ef236e47364
Author: Jarek Potiuk <[email protected]>
AuthorDate: Fri Sep 1 15:31:57 2023 +0200

    Explain the users how they can check if python code is top-level (#34006)
    
    Many users have problem with it. Adding the way how they can
    check it easily.
    
    (cherry picked from commit 9702a14dd59fb842eec720594f470b10b57b7a09)
---
 docs/apache-airflow/best-practices.rst | 90 ++++++++++++++++++++++++++++++++++
 1 file changed, 90 insertions(+)

diff --git a/docs/apache-airflow/best-practices.rst 
b/docs/apache-airflow/best-practices.rst
index 6cb2857d96..c6725c0eb5 100644
--- a/docs/apache-airflow/best-practices.rst
+++ b/docs/apache-airflow/best-practices.rst
@@ -176,6 +176,96 @@ Good example:
 
 In the Bad example, NumPy is imported each time the DAG file is parsed, which 
will result in suboptimal performance in the DAG file processing. In the Good 
example, NumPy is only imported when the task is running.
 
+Since it is not always obvious, see the next chapter to check how my code is 
"top-level" code.
+
+How to check if my code is "top-level" code
+-------------------------------------------
+
+In order to understand whether your code is "top-level" or not you need to 
understand a lot of
+intricacies of how parsing Python works. In general, when Python parses the 
python file it executes
+the code it sees, except (in general) internal code of the methods that it 
does not execute.
+
+It has a number of special cases that are not obvious - for example top-level 
code also means
+any code that is used to determine default values of methods.
+
+However, there is an easy way to check whether your code is "top-level" or 
not. You simply need to
+parse your code and see if the piece of code gets executed.
+
+Imagine this code:
+
+.. code-block:: python
+
+  from airflow import DAG
+  from airflow.operators.python import PythonOperator
+  import pendulum
+
+
+  def get_task_id():
+      return "print_array_task"  # <- is that code going to be executed?
+
+
+  def get_array():
+      return [1, 2, 3]  # <- is that code going to be executed?
+
+
+  with DAG(
+      dag_id="example_python_operator",
+      schedule=None,
+      start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
+      catchup=False,
+      tags=["example"],
+  ) as dag:
+
+      operator = PythonOperator(
+          task_id=get_task_id(),
+          python_callable=get_array,
+          dag=dag,
+      )
+
+What you can do check it, add to your code you want to check some print 
statements and run
+``python <my_dag_file>.py``.
+
+
+.. code-block:: python
+
+  from airflow import DAG
+  from airflow.operators.python import PythonOperator
+  import pendulum
+
+
+  def get_task_id():
+      print("Executing 1")
+      return "print_array_task"  # <- is that code going to be executed? YES
+
+
+  def get_array():
+      print("Executing 2")
+      return [1, 2, 3]  # <- is that code going to be executed? NO
+
+
+  with DAG(
+      dag_id="example_python_operator",
+      schedule=None,
+      start_date=pendulum.datetime(2021, 1, 1, tz="UTC"),
+      catchup=False,
+      tags=["example"],
+  ) as dag:
+
+      operator = PythonOperator(
+          task_id=get_task_id(),
+          python_callable=get_array,
+          dag=dag,
+      )
+
+When you execute that code you will see:
+
+.. code-block:: bash
+
+    root@cf85ab34571e:/opt/airflow# python /files/test_python.py
+    Executing 1
+
+This means that the ``get_array`` is not executed as top-level code, but 
``get_task_id`` is.
+
 .. _best_practices/dynamic_dag_generation:
 
 Dynamic DAG Generation

Reply via email to