BasPH commented on code in PR #25121:
URL: https://github.com/apache/airflow/pull/25121#discussion_r923396316


##########
docs/apache-airflow/howto/dynamic-dag-generation.rst:
##########
@@ -140,3 +140,20 @@ Each of them can run separately with related configuration
 
 .. warning::
   Using this practice, pay attention to "late binding" behaviour in Python 
loops. See `that GitHub discussion 
<https://github.com/apache/airflow/discussions/21278#discussioncomment-2103559>`_
 for more details
+
+
+Optimizing DAG parsing in workers/Kubernetes Pods

Review Comment:
   Hmm I tested this on Local, Celery, and Kubernetes-executor and it only 
seems to work on the KubernetesExecutor. It seems args are passed differently 
on Local and Celery-executor. This is the DAG I'm using:
   
   ```python
   import datetime
   import logging
   import sys
   
   from airflow import DAG
   from airflow.operators.bash import BashOperator
   
   logger = logging.getLogger(__name__)
   
   current_dag = None
   if len(sys.argv) > 3:
       current_dag = sys.argv[3]
   
   for thing in range(100):
       dag_id = f"generated_dag_{thing}"
       logger.info(f"CURRENT DAG = {current_dag}, DAG_ID = {dag_id}, sys.argv = 
{sys.argv}")
       if current_dag is not None and current_dag != dag_id:
           continue
   
       print(f"GENERATING DAG {dag_id}. Args = {sys.argv}")
       with DAG(dag_id=dag_id, schedule_interval="@daily", 
start_date=datetime.datetime(2022, 7, 1)) as dag:
           BashOperator(task_id="hello", bash_command="echo bla")
           globals()[dag_id] = dag
   ```
   
   Here's a snippet of the output on CeleryExecutor:
   
   ```text
   [2022-07-18 13:46:24,114: WARNING/ForkPoolWorker-15] GENERATING DAG 
generated_dag_65. Args = ['/home/airflow/.local/bin/airflow', 'celery', 
'worker']
   [2022-07-18 13:46:24,116: INFO/ForkPoolWorker-15] CURRENT DAG = None, DAG_ID 
= generated_dag_66, sys.argv = ['/home/airflow/.local/bin/airflow', 'celery', 
'worker']
   ```
   
   And on KubernetesExecutor:
   
   ```text
   [2022-07-18 13:39:56,709] {scheduler_worker_distinction.py:20} INFO - 
CURRENT DAG = generated_dag_1, DAG_ID = generated_dag_99, sys.argv = 
['/usr/local/bin/airflow', 'tasks', 'run', 'generated_dag_1', 'hello', 
'scheduled__2022-07-13T00:00:00+00:00', '--local', '--subdir', 
'DAGS_FOLDER/scheduler_worker_distinction.py']
   ```
   
   I like this optimization, but if it only works on the KubernetesExecutor we 
should be explicit about it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to