anitakar commented on issue #15306: URL: https://github.com/apache/airflow/issues/15306#issuecomment-848862444
At least for Airflow 1.10.15 not using serialization leads to very ineffective task execution in which the whole dagbag is parsed locally before a task is executed. Here is the excerpt from the code/stacktrace that proves my point: 1. `airflow worker` command starts celery_executor (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L1554) 2. Then worker executes `airflow tasks run` as set by scheduler (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L596) 3. And in there get_dag is called without specifying store_serialized_dags which is false by default (https://github.com/apache/airflow/blob/1.10.15/airflow/bin/cli.py#L618) 4. In there the whole dags directory is parsed locally on worker (https://github.com/apache/airflow/blob/5786dcdc392f7a2649f398353a0beebef01c428e/airflow/bin/cli.py#L164) It seems very inefficient to parse all dags before each task execution. I have committed a few fixes to dag serialization. I would be happy to fix at least the path for task execution within worker. @kaxil @potiuk WDYT? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
