Hello,

Sorry for writing this in the dev list, but as there is no user list yet I
decided this is the best place. We are currently running Airflow with a
SequentialExecutor and a Postgres DB in the backend. We run the airflow
scheduler and webserver using supervisor so that they should be
automatically restarted if either fails.

Normally this setting works fine. However, we have noticed that sometimes
the scheduler stops scheduling jobs and only starts rescheduling them if we
manually restart it from supervisor. I could see this message in the
airflow scheduler error logs, so the reason the scheduler stops scheduling
seems to be related to the connection to the DB:

<class 'sqlalchemy.exc.DatabaseError'> (psycopg2.DatabaseError) SSL SYSCALL
error: Connection timed out
 [SQL: 'UPDATE job SET latest_heartbeat=%(latest_heartbeat)s WHERE job.id =
%(job_id)s'] [parameters: {'latest_heartbeat': datetime.datetime(2016, 7,
11, 10, 26, 7, 44521), 'job_id': 10246}]

Also, when I look for the job id in the Airflow DB I can see the following:

  id   | dag_id |  state  |   job_type   |         start_date         |
end_date |      latest_heartbeat      |   executor_class   |
-------+--------+---------+--------------+----------------------------+----------+----------------------------+--------------------+----------+
 10246 |        | running | SchedulerJob | 2016-07-08 15:38:06.911346
|          | 2016-07-14 05:30:56.407149 | SequentialExecutor |

The latest heartbeat corresponds to the moment when the scheduler stopped
scheduling jobs. Our supervisor configuration for the scheduler is the
following:

[program:airflow-scheduler]
command= airflow scheduler
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/logs/airflow-logs/airflow-scheduler.err.log
stdout_logfile=/var/logs/airflow-logs/airflow-scheduler.out.log

I have added these two lines now to the supervisor configuration in case
the problem was that supervisor was not tracing that the scheduler had quit:

stopsignal=QUIT
stopasgroup=true

If anyone has had a similar problem, or any other ideas as to how we could
avoid the need to manually restart the scheduler and also what could be
causing the scheduler to stop in the first place, they would be much
appreciated.

Cheers,

-- 
[image: logo]
*Tamara Mendt* *Data Engineer**, HelloFresh Global*
Tel: +49 (0)175 226 18 12 <+4903000000000> | Saarbrücker Str. 37a | 10405
Berlin
[email protected]
  <http://www.facebook.com/hellofreshde>  <http://twitter.com/HelloFreshde>
<http://instagram.com/hellofreshde/>  <http://blog.hellofresh.de/>
<https://app.adjust.com/ayje08_2qh16w?campaign=Signature&adgroup=US&deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp&fallback=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp>
*HelloFresh App –Download Now!*
<https://app.adjust.com/ayje08_2qh16w?campaign=Signature&adgroup=US&deep_link=hellofresh%3A%2F%2F&post_deep_link=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp&fallback=https%3A%2F%2Fwww.hellofresh.com%2Fapp%2F%3Futm_source%3Demail%26utm_medium%3Dsignature%26utm_campaign%3Dapp>
*We're active in:* US
<https://www.hellofresh.com/?utm_medium=email&utm_source=email_signature> |
DE <https://www.hellofresh.de/?utm_medium=email&utm_source=email_signature>
| UK
<https://www.hellofresh.co.uk/?utm_medium=email&utm_source=email_signature>
| NL
<https://www.hellofresh.nl/?utm_medium=email&utm_source=email_signature> |
AU
<https://www.hellofresh.au.com/?utm_medium=email&utm_source=email_signature>
 | BE
<https://www.hellofresh.be/?utm_medium=email&utm_source=email_signature> |
AT <https://www.hellofresh.at/?utm_medium=email&utm_source=email_signature>
www.HelloFreshGroup.com <http://www.hellofreshgroup.com/>

We are hiring around the world – Click here to join us
<https://www.hellofresh.de/jobs>
HelloFresh AG, Berlin (Sitz der Gesellschaft) | Vorstände: Dominik S.
Richter (Vorsitzender), Thomas W. Griesel, Christian Gärtner | Vorsitzender
des Aufsichtsrats: Jeffrey Lieberman | Eingetragen beim Amtsgericht
Charlottenburg, HRB 171666 B | USt-Id Nr.: DE 302210417

*CONFIDENTIALITY NOTICE:*This message (including any attachments) is
confidential and may be privileged. It may be read, copied and used only by
the intended recipient. If you have received it in error please contact the
sender (by return e-mail) immediately and delete this message. Any
unauthorized use or dissemination of this message in whole or in parts is
strictly prohibited.

Reply via email to