alkismavridis commented on issue #55768: URL: https://github.com/apache/airflow/issues/55768#issuecomment-3359816439
I can confirm that the workaround from @kaxil is working for us. We played a bit around with the value of num_runs and figured out that something in the range of 3000 does the job for us. It restarts every aprox. 50mins and the RAM usage stays in the range of 1G - 1.5G which is OK. <img width="491" height="284" alt="Image" src="https://github.com/user-attachments/assets/a71404c8-8160-4399-959b-1776bb664e53" /> @lucidumio I am just guessing here, but I assume the airflow process ends (because of num_runs), but your docker-container does not restart. Your health check runs the command to check it the scheduler is running, the answer is no, and thus you end up with an unhealthy container. The way we solve this problem is that our healthcheck command ALSO terminates the container when it finds it unhealthy. Then. docker restarts it. Here is our docker-compose section for the scheduler. Please note 2 things: - restart: always - Our healthcheck command. The section after the OR operator kills the all PIDS. Thus, container restarts. ```yaml airflow-scheduler: <<: *airflow-common container_name: airflow-scheduler command: scheduler healthcheck: test: ["CMD-SHELL", "airflow jobs check --job-type SchedulerJob --local || bash -c 'kill -s 15 -1 && (sleep 10; kill -s 9 -1)'"] interval: 30s timeout: 10s retries: 5 start_period: 150s restart: always depends_on: <<: *airflow-common-depends-on airflow-init: condition: service_completed_successfully mem_limit: 4000m ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
