Hello, My airflow scheduler seems to be getting stuck due to an error.
>From scheduler logs, HTTPError: HTTP 502: socket error Logged from file jobs.py, line 574 Looks like it happens when the scheduler is trying to get the list of queued tasks from the metadata database. There are no errors being reported on the DB side though. The metadata database is a mysql RDS instance running on aws. I will have to restart the scheduler service manually multiple times to get it going before it gets stuck again. It appears that the scheduler has some trouble polling the db occasionally. But, this is only error i see from the logs. Below is my config, sql_alchemy_pool_recycle = 3600 parallelism = 32 celeryd_concurrency = 4 scheduler_heartbeat_sec = 120 Has someone faced this similar error with the scheduler or metadata db? Please share any inputs that could help me resolve this issue. Is there an optimal configuration for the scheduler that i can put in airflow.cfg to enable the scheduler run smoothly and be fast? Please share the scheduler related configs if you have one that is running without problems. Thanks, Nadeem
