This is an automated email from the ASF dual-hosted git repository. kaxilnaik pushed a commit to branch v1-10-test in repository https://gitbox.apache.org/repos/asf/airflow.git
commit 3c81b30b324077b76adb753220b63350842e5f73 Author: Ry Walker <[email protected]> AuthorDate: Tue Feb 4 03:43:02 2020 -0500 [AIRFLOW-XXXX] Add scheduler in production section (#7351) (cherry picked from commit 34f5d6f3c0a6f6f6f2066c70ede0c71245eb6931) --- docs/best-practices.rst | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/docs/best-practices.rst b/docs/best-practices.rst index 9b6b3ae..1a27e73 100644 --- a/docs/best-practices.rst +++ b/docs/best-practices.rst @@ -315,3 +315,16 @@ Some configurations such as Airflow Backend connection URI can be derived from b .. code:: sql_alchemy_conn_cmd = bash_command_to_run + + +Scheduler Uptime +----------------- + +Airflow users have for a long time been affected by a +`core Airflow bug <https://issues.apache.org/jira/browse/AIRFLOW-401>`_ +that causes the scheduler to hang without a trace. + +Until fully resolved, you can mitigate a few ways: + +* Set a reasonable run_duration setting in your ``airflow.cfg``. `Example config <https://github.com/astronomer/airflow-chart/blob/63bc503c67e2cd599df0b6f831d470d09bad7ee7/templates/configmap.yaml#L44>`_. +* Add an ``exec`` style health check to your helm charts on the scheduler deployment to fail if the scheduler has not heartbeat in a while. `Example health check definition <https://github.com/astronomer/helm.astronomer.io/pull/200/files>`_.
