Hi Airflow Community,
We (Airbnb) would like to open source an internal developed feature: streaming deserialize the airflow task in airflow run process to avoid dag parsing. This has been running in our prod for around 3 quarters and it is very stable. It helped to bring down average task start time from 24 seconds to 3 seconds (across all 3 airflow clusters) and reduce peak memory usage by around 45% I will create an AIP after the discussion. Thanks, Ping Motivation To run airflow tasks, airflow needs to parse dag file twice, once in airflow run local process, once in airflow run raw. This is a waste of memory, doubling the dag parsing ram. During the peak hour, the CPU can spike for a long time due to double parsing, thus slowing down the task starting time , even causing dag parsing timeout. This is especially true when there are complex large dags in the airflow cluster. What change do you propose to make? The dag parsing in the `airflow run local` can be removed by reading the dag from the serialized_dag table. Also, the memory usage can be reduced by only deserializing the target task in a streaming way instead of deserializing the whole dag object.
