Hello,

Would really appreciate any response on this.

I've also observed that the scheduler gets stuck when running big jobs that
run for hours. We have a mapreduce job that runs for 5-6 hours. After this
job is complete, scheduler doesn't seem to run the downstream tasks and its
stuck until its manually restarted. This time i dont even see the HTTP
errors that i mentioned earlier. Not sure why it would get stuck,

Airflow version 1.7.1.2

Thanks,
Nadeem

On Wed, Jul 20, 2016 at 5:18 PM, Nadeem Ahmed Nazeer <[email protected]>
wrote:

> Hello,
>
> My airflow scheduler seems to be getting stuck due to an error.
>
> From scheduler logs,
>
> HTTPError: HTTP 502: socket error
> Logged from file jobs.py, line 574
>
> Looks like it happens when the scheduler is trying to get the list of
> queued tasks from the metadata database. There are no errors being reported
> on the DB side though. The metadata database is a mysql RDS instance
> running on aws.
>
> I will have to restart the scheduler service manually multiple times to
> get it going before it gets stuck again. It appears that the scheduler has
> some trouble polling the db occasionally. But, this is only error i see
> from the logs.
>
> Below is my config,
>
> sql_alchemy_pool_recycle = 3600
> parallelism = 32
> celeryd_concurrency = 4
> scheduler_heartbeat_sec = 120
>
> Has someone faced this similar error with the scheduler or metadata db?
> Please share any inputs that could help me resolve this issue.
>
> Is there an optimal configuration for the scheduler that i can put in
> airflow.cfg to enable the scheduler run smoothly and be fast? Please share
> the scheduler related configs if you have one that is running without
> problems.
>
> Thanks,
> Nadeem
>

Reply via email to