mtraynham opened a new issue #21026: URL: https://github.com/apache/airflow/issues/21026
### Apache Airflow version 2.2.3 (latest released) ### What happened With a docker setup (as defined in the link below), the `airflow-worker` service `healthcheck.test` command causes a general increase in memory use overtime. This was observed with Airflow 2.1.4 and 2.2.3. https://github.com/apache/airflow/blob/958860fcd7c9ecdf60b7ebeef4397b348835c8db/docs/apache-airflow/start/docker-compose.yaml#L131-L137 We observed this in our AWS ECS cluster where with a 0.5 CPU/1 GB Mem Worker setup, strangely had a task fail at the 2nd dip in memory use, and prompted further investigation. We noticed the metrics page looked like the image below.  We raised the CPU & Memory to 2 CPU / 4 GB Mem and restarted the service, which still produced a gradual increase in memory.  ### What you expected to happen It should generally not increase in memory when the system is idle, but rather spike during healthcheck and release memory back to the host. ### How to reproduce We use a modified version of the compose file and instead favor docker stack, but the same setup could apply. A slimmed down compose file is below. The stack below has 2 workers, one with a healthcheck and one without. Executing the stack is fairly simply, ```bash $ docker stack deploy -c docker-compose.yaml airflow ``` A secondary script was written to scrape the docker statistics in 10 second intervals and write them to a CSV file. *collect_stats.sh* ```bash #!/usr/bin/env sh healthcheck=$(docker ps --format "{{.Names}}" | grep worker_healthcheck) no_healthcheck=$(docker ps --format "{{.Names}}" | grep worker_no_healthcheck) echo "Date,Container,CPU Percent,Mem Usage,Mem Percent" while true; do time=$(date --utc +%FT%T%Z) docker stats ${containers} \ --format "table {{.Name}},{{.CPUPerc}},{{.MemUsage}},{{.MemPerc}}" \ --no-stream \ | grep worker \ | awk -vT="${time}," '{ print T $0 }' sleep 10 done ``` ```yaml --- version: '3.7' networks: net: driver: overlay attachable: true volumes: postgres-data: redis-data: services: postgres: image: postgres:13.2-alpine volumes: - postgres-data:/var/lib/postgresql/data environment: POSTGRES_USER: airflow POSTGRES_PASSWORD: airflow POSTGRES_DB: airflow healthcheck: test: pg_isready -U airflow -d airflow interval: 10s timeout: 3s start_period: 15s ports: - '5432:5432' networks: - net redis: image: redis:6.2 volumes: - redis-data:/data healthcheck: test: redis-cli ping interval: 10s timeout: 3s start_period: 15s ports: - '6379:6379' networks: - net webserver: image: apache/airflow:2.2.3-python3.8 command: - bash - -c - 'airflow db init && airflow db upgrade && airflow users create --username admin --firstname Admin --lastname User --password admin --role Admin --email [email protected] && airflow webserver' environment: AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s= AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__LOGGING__LOGGING_LEVEL: INFO AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789 healthcheck: test: curl --fail http://localhost:8080/health interval: 10s timeout: 10s retries: 10 start_period: 90s ports: - '8080:8080' networks: - net scheduler: image: apache/airflow:2.2.3-python3.8 command: scheduler environment: AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s= AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__LOGGING__LOGGING_LEVEL: INFO AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789 healthcheck: test: airflow db check interval: 20s timeout: 10s retries: 5 start_period: 40s networks: - net worker_healthcheck: image: apache/airflow:2.2.3-python3.8 command: celery worker environment: AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s= AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__LOGGING__LOGGING_LEVEL: DEBUG AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789 healthcheck: test: - "CMD-SHELL" - 'celery --app airflow.executors.celery_executor.app inspect ping -d "celery@$${HOSTNAME}"' interval: 10s timeout: 10s retries: 5 start_period: 40s networks: - net worker_no_healthcheck: image: apache/airflow:2.2.3-python3.8 command: celery worker environment: AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1 AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__CORE__EXECUTOR: CeleryExecutor AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s= AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql://airflow:airflow@postgres:5432/airflow AIRFLOW__LOGGING__LOGGING_LEVEL: DEBUG AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789 networks: - net ``` ### Operating System Ubuntu 20.04.3 LTS ### Versions of Apache Airflow Providers _No response_ ### Deployment Docker-Compose ### Deployment details _No response_ ### Anything else _No response_ ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
