Hello, Would really appreciate any response on this.
I've also observed that the scheduler gets stuck when running big jobs that run for hours. We have a mapreduce job that runs for 5-6 hours. After this job is complete, scheduler doesn't seem to run the downstream tasks and its stuck until its manually restarted. This time i dont even see the HTTP errors that i mentioned earlier. Not sure why it would get stuck, Airflow version 1.7.1.2 Thanks, Nadeem On Wed, Jul 20, 2016 at 5:18 PM, Nadeem Ahmed Nazeer <[email protected]> wrote: > Hello, > > My airflow scheduler seems to be getting stuck due to an error. > > From scheduler logs, > > HTTPError: HTTP 502: socket error > Logged from file jobs.py, line 574 > > Looks like it happens when the scheduler is trying to get the list of > queued tasks from the metadata database. There are no errors being reported > on the DB side though. The metadata database is a mysql RDS instance > running on aws. > > I will have to restart the scheduler service manually multiple times to > get it going before it gets stuck again. It appears that the scheduler has > some trouble polling the db occasionally. But, this is only error i see > from the logs. > > Below is my config, > > sql_alchemy_pool_recycle = 3600 > parallelism = 32 > celeryd_concurrency = 4 > scheduler_heartbeat_sec = 120 > > Has someone faced this similar error with the scheduler or metadata db? > Please share any inputs that could help me resolve this issue. > > Is there an optimal configuration for the scheduler that i can put in > airflow.cfg to enable the scheduler run smoothly and be fast? Please share > the scheduler related configs if you have one that is running without > problems. > > Thanks, > Nadeem >
