mtraynham opened a new issue #21026:
URL: https://github.com/apache/airflow/issues/21026


   ### Apache Airflow version
   
   2.2.3 (latest released)
   
   ### What happened
   
   With a docker setup (as defined in the link below), the `airflow-worker` 
service `healthcheck.test` command causes a general increase in memory use 
overtime.  This was observed with Airflow 2.1.4 and 2.2.3.
   
   
https://github.com/apache/airflow/blob/958860fcd7c9ecdf60b7ebeef4397b348835c8db/docs/apache-airflow/start/docker-compose.yaml#L131-L137
   
   We observed this in our AWS ECS cluster where with a 0.5 CPU/1 GB Mem Worker 
setup, strangely had a task fail at the 2nd dip in memory use, and prompted 
further investigation.  We noticed the metrics page looked like the image below.
   
![image](https://user-images.githubusercontent.com/5741321/150593627-3bf63cfc-e8b2-492d-b8e6-5cc2933a3150.png)
   
   We raised the CPU & Memory to 2 CPU / 4 GB Mem and restarted the service, 
which still produced a gradual increase in memory.
   
![image](https://user-images.githubusercontent.com/5741321/150593887-d02b454a-d51a-42ad-8479-3f7cce8291f5.png)
   
   ### What you expected to happen
   
   It should generally not increase in memory when the system is idle, but 
rather spike during healthcheck and release memory back to the host.
   
   ### How to reproduce
   
   We use a modified version of the compose file and instead favor docker 
stack, but the same setup could apply.  A slimmed down compose file is below.  
The stack below has 2 workers, one with a healthcheck and one without.
   
   Executing the stack is fairly simply,
   
   ```bash
   $ docker stack deploy -c docker-compose.yaml airflow
   ```
   
   A secondary script was written to scrape the docker statistics in 10 second 
intervals and write them to a CSV file.
   
   *collect_stats.sh*
   ```bash
   #!/usr/bin/env sh
   
   healthcheck=$(docker ps --format "{{.Names}}" | grep worker_healthcheck)
   no_healthcheck=$(docker ps --format "{{.Names}}" | grep 
worker_no_healthcheck)
   
   echo "Date,Container,CPU Percent,Mem Usage,Mem Percent"
   while true; do
       time=$(date --utc +%FT%T%Z)
       docker stats ${containers} \
         --format "table {{.Name}},{{.CPUPerc}},{{.MemUsage}},{{.MemPerc}}" \
         --no-stream \
         | grep worker \
         | awk -vT="${time}," '{ print T $0 }'
       sleep 10
   done
   
   ```
   
   ```yaml
   ---
   version: '3.7'
   
   networks:
     net:
       driver: overlay
       attachable: true
   
   volumes:
     postgres-data:
     redis-data:
   
   services:
     postgres:
       image: postgres:13.2-alpine
       volumes:
         - postgres-data:/var/lib/postgresql/data
       environment:
         POSTGRES_USER: airflow
         POSTGRES_PASSWORD: airflow
         POSTGRES_DB: airflow
       healthcheck:
         test: pg_isready -U airflow -d airflow
         interval: 10s
         timeout: 3s
         start_period: 15s
       ports:
         - '5432:5432'
       networks:
         - net
   
     redis:
       image: redis:6.2
       volumes:
         - redis-data:/data
       healthcheck:
         test: redis-cli ping
         interval: 10s
         timeout: 3s
         start_period: 15s
       ports:
         - '6379:6379'
       networks:
         - net
   
     webserver:
       image: apache/airflow:2.2.3-python3.8
       command:
         - bash
         - -c
         - 'airflow db init
         && airflow db upgrade
         && airflow users create --username admin --firstname Admin --lastname 
User --password admin --role Admin --email [email protected]
         && airflow webserver'
       environment:
         AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
         AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1
         AIRFLOW__CELERY__RESULT_BACKEND: 
db+postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: CeleryExecutor
         AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s=
         AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__LOGGING__LOGGING_LEVEL: INFO
         AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789
       healthcheck:
         test: curl --fail http://localhost:8080/health
         interval: 10s
         timeout: 10s
         retries: 10
         start_period: 90s
       ports:
         - '8080:8080'
       networks:
         - net
   
     scheduler:
       image: apache/airflow:2.2.3-python3.8
       command: scheduler
       environment:
         AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
         AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1
         AIRFLOW__CELERY__RESULT_BACKEND: 
db+postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: CeleryExecutor
         AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s=
         AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__LOGGING__LOGGING_LEVEL: INFO
         AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789
       healthcheck:
         test: airflow db check
         interval: 20s
         timeout: 10s
         retries: 5
         start_period: 40s
       networks:
         - net
   
     worker_healthcheck:
       image: apache/airflow:2.2.3-python3.8
       command: celery worker
       environment:
         AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
         AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1
         AIRFLOW__CELERY__RESULT_BACKEND: 
db+postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: CeleryExecutor
         AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s=
         AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__LOGGING__LOGGING_LEVEL: DEBUG
         AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789
       healthcheck:
         test:
           - "CMD-SHELL"
           - 'celery --app airflow.executors.celery_executor.app inspect ping 
-d "celery@$${HOSTNAME}"'
         interval: 10s
         timeout: 10s
         retries: 5
         start_period: 40s
       networks:
         - net
   
     worker_no_healthcheck:
       image: apache/airflow:2.2.3-python3.8
       command: celery worker
       environment:
         AIRFLOW__API__AUTH_BACKEND: airflow.api.auth.backend.basic_auth
         AIRFLOW__CELERY__BROKER_URL: redis://redis:6379/1
         AIRFLOW__CELERY__RESULT_BACKEND: 
db+postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__CORE__EXECUTOR: CeleryExecutor
         AIRFLOW__CORE__FERNET_KEY: yxfSDUw_7SG6BhBstIt7dFzL5rpnxvr_Jkv0tFyEJ3s=
         AIRFLOW__CORE__SQL_ALCHEMY_CONN: 
postgresql://airflow:airflow@postgres:5432/airflow
         AIRFLOW__LOGGING__LOGGING_LEVEL: DEBUG
         AIRFLOW__WEBSERVER__SECRET_KEY: 0123456789
       networks:
         - net
   ```
   
   ### Operating System
   
   Ubuntu 20.04.3 LTS
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Docker-Compose
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to