GitHub user erl987 edited a discussion: Airflow with CeleryExecutor becomes extremely slow when executing a long-running task
# Task to solve with Airflow Provide a pipeline that is triggered by external events (through the Airflow REST-API), then a main simulation task is executed taking 1–5 hours. Afterward a number of post-processing steps are performed on the simulation results and in a final task the results are uploaded to a database. The pipeline will be triggered several times per hour. It should be as standard as possible to allow for operation and maintenance by non-simulation domain staff, and the post-processing tasks should be flexible to be changed without bothering about the simulation task details. The number of workers should be easy to change, possibly using multiple machines. For me, `Airflow` with `Celery` seems to fulfill all these requirements, and therefore I am trying to set up a demonstration system. # Setup * latest `Airflow==2.10.3` with `CeleryExecutor` * `docker-compose` stack (from here: https://airflow.apache.org/docs/apache-airflow/stable/howto/docker-compose/index.html#fetching-docker-compose-yaml) * Ubuntu 24.04 LTS VM in the Azure cloud (16 GB memory, ca. 100 GB disk) # Problem The problem I experience is the follows: * trigger a single execution of a long-running simulation task (> 1 hour, some GB memory consumption) * first, the task is executed normally, only the worker process constantly takes 100% single CPU load * **after ca. one hour, the `triggerer` container suddenly logs a lot of messages like this: `INFO - Triggerer's async thread was blocked for 0.27 seconds, likely a badly-written trigger.`** * the memory of the machine is still at only ca. 80% (of 16 GB) at that time, the CPU load at ca. 50% * no problems are reported on the webpage, the logs from the task still progressing * there are a lot of `celery` and `airflow` processes that take a CPU high load, **the actual worker process only has a low CPU load and continues extremely slowly**, also the webpage is very sluggish ## DAG to reproduce the problem: ```python from datetime import timedelta from airflow import DAG from airflow.decorators import task default_args = { 'owner': 'airflow', 'depends_on_past': False, 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=1), } dag = DAG( 'task_mock', default_args=default_args, description='A mock for resource intensive calculations', schedule_interval=None, start_date=None, tags=['mock'], ) @task(task_id="resource_intensive_calculation_mock", dag=dag) def task_mock(): size_in_gb = 8 large_array = allocate_large_memory(size_in_gb) if large_array is not None: process_indefinitely(large_array) def allocate_large_memory(size_in_gb): import numpy as np # Convert GB to number of float64 elements (1 float64 is 8 bytes) num_elements = size_in_gb * (1024 ** 3) // 8 try: # Allocate a large numpy array large_array = np.zeros(num_elements, dtype=np.float64) print(f"Allocated array with {num_elements} elements (~{size_in_gb} GB).") return large_array except MemoryError: print("Memory allocation failed. Reduce the size.") return None def process_indefinitely(arr): import numpy as np iteration = 0 while True: # Perform some operations on the array arr += 1 arr *= 2 arr /= 2 arr -= 1 iteration += 1 if iteration % 10 == 0: print(f"Iteration {iteration}: Array sum = {np.sum(arr)}") task_mock() ``` This is a mock task reproducing the behavior of the real-world simulation code in terms of memory (here 8 GB, but it also happens with less) and CPU-consumption. # Questions * Is it wise to use Airflow in such a use-case? * Should I consider an alternative, if so which? * This behavior feels for me like a bug? * Any workarounds to mitigate the problem? I am happy to provide more details if required, but I think that this DAG should be reproducible with the standard Docker Compose stack. <img width="844" alt="Screenshot 2024-11-27 at 19 28 20" src="https://github.com/user-attachments/assets/232480f1-6b0c-4521-80cc-313034791ada"> <img width="837" alt="Screenshot 2024-11-27 at 19 28 31" src="https://github.com/user-attachments/assets/814899f6-51fe-4c13-baf2-5a5301fd59b1"> GitHub link: https://github.com/apache/airflow/discussions/44430 ---- This is an automatically sent email for [email protected]. To unsubscribe, please send an email to: [email protected]
