BasPH commented on code in PR #25121: URL: https://github.com/apache/airflow/pull/25121#discussion_r923396316
########## docs/apache-airflow/howto/dynamic-dag-generation.rst: ########## @@ -140,3 +140,20 @@ Each of them can run separately with related configuration .. warning:: Using this practice, pay attention to "late binding" behaviour in Python loops. See `that GitHub discussion <https://github.com/apache/airflow/discussions/21278#discussioncomment-2103559>`_ for more details + + +Optimizing DAG parsing in workers/Kubernetes Pods Review Comment: Hmm I tested this on Local, Celery, and Kubernetes-executor and it only seems to work on the KubernetesExecutor. It seems args are passed differently on Local and Celery-executor. This is the DAG I'm using: ```python import datetime import logging import sys from airflow import DAG from airflow.operators.bash import BashOperator logger = logging.getLogger(__name__) current_dag = None if len(sys.argv) > 3: current_dag = sys.argv[3] for thing in range(100): dag_id = f"generated_dag_{thing}" logger.info(f"CURRENT DAG = {current_dag}, DAG_ID = {dag_id}, sys.argv = {sys.argv}") if current_dag is not None and current_dag != dag_id: continue print(f"GENERATING DAG {dag_id}. Args = {sys.argv}") with DAG(dag_id=dag_id, schedule_interval="@daily", start_date=datetime.datetime(2022, 7, 1)) as dag: BashOperator(task_id="hello", bash_command="echo bla") globals()[dag_id] = dag ``` Here's a snippet of the output on CeleryExecutor: ```text [2022-07-18 13:46:24,114: WARNING/ForkPoolWorker-15] GENERATING DAG generated_dag_65. Args = ['/home/airflow/.local/bin/airflow', 'celery', 'worker'] [2022-07-18 13:46:24,116: INFO/ForkPoolWorker-15] CURRENT DAG = None, DAG_ID = generated_dag_66, sys.argv = ['/home/airflow/.local/bin/airflow', 'celery', 'worker'] ``` And on KubernetesExecutor: ```text [2022-07-18 13:39:56,709] {scheduler_worker_distinction.py:20} INFO - CURRENT DAG = generated_dag_1, DAG_ID = generated_dag_99, sys.argv = ['/usr/local/bin/airflow', 'tasks', 'run', 'generated_dag_1', 'hello', 'scheduled__2022-07-13T00:00:00+00:00', '--local', '--subdir', 'DAGS_FOLDER/scheduler_worker_distinction.py'] ``` I like this optimization, but if it only works on the KubernetesExecutor we should be explicit about it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
