kaxil commented on a change in pull request #7351: [AIRFLOW-XXXX] Add scheduler in production section to best practices doc URL: https://github.com/apache/airflow/pull/7351#discussion_r374380633
########## File path: docs/best-practices.rst ########## @@ -315,3 +315,16 @@ Some configurations such as Airflow Backend connection URI can be derived from b .. code:: sql_alchemy_conn_cmd = bash_command_to_run + + +Scheduler Uptime +----------------- + +Airflow users have for a long time been affected by a +`core Airflow bug <https://issues.apache.org/jira/browse/AIRFLOW-401>`_ +that causes the scheduler to hang without a trace. + +Until fully resolved, you can mitigate a few ways: + +* Set a reasonable run_duration setting in your `airflow.cfg`. `See example <https://github.com/astronomer/airflow-chart/blob/63bc503c67e2cd599df0b6f831d470d09bad7ee7/templates/configmap.yaml#L44>`_. +* Add an `exec` style health check to your helm charts on the scheduler deployment to fail if the scheduler has not heartbeat in a while. `See example <https://github.com/astronomer/helm.astronomer.io/pull/200/files>`_. Review comment: ```suggestion * Add an ``exec`` style health check to your helm charts on the scheduler deployment to fail if the scheduler has not heartbeat in a while. `See example <https://github.com/astronomer/helm.astronomer.io/pull/200/files>`_. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
