Please include logging, dag structure, anything else relevant. Preferably add them in a jira. This is really little to go on. Sorry!
Sent from my iPhone > On 25 jul. 2016, at 20:23, Nadeem Ahmed Nazeer <[email protected]> wrote: > > Hello, > > Would really appreciate any response on this. > > I've also observed that the scheduler gets stuck when running big jobs that > run for hours. We have a mapreduce job that runs for 5-6 hours. After this > job is complete, scheduler doesn't seem to run the downstream tasks and its > stuck until its manually restarted. This time i dont even see the HTTP > errors that i mentioned earlier. Not sure why it would get stuck, > > Airflow version 1.7.1.2 > > Thanks, > Nadeem > > On Wed, Jul 20, 2016 at 5:18 PM, Nadeem Ahmed Nazeer <[email protected]> > wrote: > >> Hello, >> >> My airflow scheduler seems to be getting stuck due to an error. >> >> From scheduler logs, >> >> HTTPError: HTTP 502: socket error >> Logged from file jobs.py, line 574 >> >> Looks like it happens when the scheduler is trying to get the list of >> queued tasks from the metadata database. There are no errors being reported >> on the DB side though. The metadata database is a mysql RDS instance >> running on aws. >> >> I will have to restart the scheduler service manually multiple times to >> get it going before it gets stuck again. It appears that the scheduler has >> some trouble polling the db occasionally. But, this is only error i see >> from the logs. >> >> Below is my config, >> >> sql_alchemy_pool_recycle = 3600 >> parallelism = 32 >> celeryd_concurrency = 4 >> scheduler_heartbeat_sec = 120 >> >> Has someone faced this similar error with the scheduler or metadata db? >> Please share any inputs that could help me resolve this issue. >> >> Is there an optimal configuration for the scheduler that i can put in >> airflow.cfg to enable the scheduler run smoothly and be fast? Please share >> the scheduler related configs if you have one that is running without >> problems. >> >> Thanks, >> Nadeem >>
