[GitHub] [airflow] o-nikolas commented on a diff in pull request #32669: Store config description in Airflow configuration object

via GitHub Wed, 19 Jul 2023 15:01:23 -0700


o-nikolas commented on code in PR #32669:
URL: https://github.com/apache/airflow/pull/32669#discussion_r1268708267



##########
airflow/config_templates/default_airflow.cfg:
##########
@@ -16,1507 +15,31 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# This is the template for Airflow's default configuration. When Airflow is
-# imported, it looks for a configuration file at $AIRFLOW_HOME/airflow.cfg. If
-# it doesn't exist, Airflow uses this template to generate it by replacing
-# variables in curly braces with their global values from configuration.py.
-
-# Users should not modify this file; they should customize the generated
-# airflow.cfg instead.
-
-
-# ----------------------- TEMPLATE BEGINS HERE -----------------------
-
-[core]
-# The folder where your airflow pipelines live, most likely a
-# subfolder in a code repository. This path must be absolute.
-dags_folder = {AIRFLOW_HOME}/dags
-
-# Hostname by providing a path to a callable, which will resolve the hostname.
-# The format is "package.function".
 #
-# For example, default value "airflow.utils.net.getfqdn" means that result 
from patched
-# version of socket.getfqdn() - see 
https://github.com/python/cpython/issues/49254.
+# NOTE:
 #
-# No argument should be required in the function specified.
-# If using IP address as hostname is preferred, use value 
``airflow.utils.net.get_host_ip_address``
-hostname_callable = airflow.utils.net.getfqdn
-
-# A callable to check if a python file has airflow dags defined or not
-# with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
-# return True if it has dags otherwise False
-# If this is not provided, Airflow uses its own heuristic rules.
-might_contain_dag_callable = 
airflow.utils.file.might_contain_dag_via_default_heuristic
-
-# Default timezone in case supplied date times are naive
-# can be utc (default), system, or any IANA timezone string (e.g. 
Europe/Amsterdam)
-default_timezone = utc
-
-# The executor class that airflow should use. Choices include
-# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, 
``DaskExecutor``,
-# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
-# full import path to the class when using a custom executor.
-executor = SequentialExecutor
-
-# The auth manager class that airflow should use. Full import path to the auth 
manager class.
-auth_manager = airflow.auth.managers.fab.fab_auth_manager.FabAuthManager
-
-# This defines the maximum number of task instances that can run concurrently 
per scheduler in
-# Airflow, regardless of the worker count. Generally this value, multiplied by 
the number of
-# schedulers in your cluster, is the maximum number of task instances with the 
running
-# state in the metadata database.
-parallelism = 32
-
-# The maximum number of task instances allowed to run concurrently in each 
DAG. To calculate
-# the number of tasks that is running concurrently for a DAG, add up the 
number of running
-# tasks for all DAG runs of the DAG. This is configurable at the DAG level 
with ``max_active_tasks``,
-# which is defaulted as ``max_active_tasks_per_dag``.
+# IF YOU ARE LOOKING FOR DEFAULT CONFIGURATION FILE HERE  - LOOK NO MORE. READ 
EXPLANATION BELOW!
 #
-# An example scenario when this would be useful is when you want to stop a new 
dag with an early
-# start date from stealing all the executor slots in a cluster.
-max_active_tasks_per_dag = 16
-
-# Are DAGs paused by default at creation
-dags_are_paused_at_creation = True
-
-# The maximum number of active DAG runs per DAG. The scheduler will not create 
more DAG runs
-# if it reaches the limit. This is configurable at the DAG level with 
``max_active_runs``,
-# which is defaulted as ``max_active_runs_per_dag``.
-max_active_runs_per_dag = 16
-
-# The name of the method used in order to start Python processes via the 
multiprocessing module.
-# This corresponds directly with the options available in the Python docs:
-# 
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method.
-# Must be one of the values returned by:
-# 
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.get_all_start_methods.
-# Example: mp_start_method = fork
-# mp_start_method =
-
-# Whether to load the DAG examples that ship with Airflow. It's good to
-# get started, but you probably want to set this to ``False`` in a production
-# environment
-load_examples = True
-
-# Path to the folder containing Airflow plugins
-plugins_folder = {AIRFLOW_HOME}/plugins
-
-# Should tasks be executed via forking of the parent process ("False",
-# the speedier option) or by spawning a new python process ("True" slow,
-# but means plugin changes picked up by tasks straight away)
-execute_tasks_new_python_interpreter = False
-
-# Secret key to save connection passwords in the db
-fernet_key = {FERNET_KEY}
-
-# Whether to disable pickling dags
-donot_pickle = True
-
-# How long before timing out a python file import
-dagbag_import_timeout = 30.0
-
-# Should a traceback be shown in the UI for dagbag import errors,
-# instead of just the exception message
-dagbag_import_error_tracebacks = True
-
-# If tracebacks are shown, how many entries from the traceback should be shown
-dagbag_import_error_traceback_depth = 2
-
-# How long before timing out a DagFileProcessor, which processes a dag file
-dag_file_processor_timeout = 50
-
-# The class to use for running task instances in a subprocess.
-# Choices include StandardTaskRunner, CgroupTaskRunner or the full import path 
to the class
-# when using a custom task runner.
-task_runner = StandardTaskRunner
-
-# If set, tasks without a ``run_as_user`` argument will be run with this user
-# Can be used to de-elevate a sudo user running Airflow when executing tasks
-default_impersonation =
-
-# What security module to use (for example kerberos)
-security =
-
-# Turn unit test mode on (overwrites many configuration options with test
-# values at runtime)
-unit_test_mode = False
-
-# Whether to enable pickling for xcom (note that this is insecure and allows 
for
-# RCE exploits).
-enable_xcom_pickling = False
-
-# What classes can be imported during deserialization. This is a multi line 
value.
-# The individual items will be parsed as regexp. Python built-in classes (like 
dict)
-# are always allowed. Bare "." will be replaced so you can set airflow.* .
-allowed_deserialization_classes = airflow\..*
-
-# When a task is killed forcefully, this is the amount of time in seconds that
-# it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED
-killed_task_cleanup_time = 60
-
-# Whether to override params with dag_run.conf. If you pass some key-value 
pairs
-# through ``airflow dags backfill -c`` or
-# ``airflow dags trigger -c``, the key-value pairs will override the existing 
ones in params.
-dag_run_conf_overrides_params = True
-
-# If enabled, Airflow will only scan files containing both ``DAG`` and 
``airflow`` (case-insensitive).
-dag_discovery_safe_mode = True
-
-# The pattern syntax used in the ".airflowignore" files in the DAG 
directories. Valid values are
-# ``regexp`` or ``glob``.
-dag_ignore_file_syntax = regexp
-
-# The number of retries each task is going to have by default. Can be 
overridden at dag or task level.
-default_task_retries = 0
-
-# The number of seconds each task is going to wait by default between retries. 
Can be overridden at
-# dag or task level.
-default_task_retry_delay = 300
-
-# The maximum delay (in seconds) each task is going to wait by default between 
retries.
-# This is a global setting and cannot be overridden at task or DAG level.
-max_task_retry_delay = 86400
-
-# The weighting method used for the effective total priority weight of the task
-default_task_weight_rule = downstream
-
-# The default task execution_timeout value for the operators. Expected an 
integer value to
-# be passed into timedelta as seconds. If not specified, then the value is 
considered as None,
-# meaning that the operators are never timed out by default.
-default_task_execution_timeout =
-
-# Updating serialized DAG can not be faster than a minimum interval to reduce 
database write rate.
-min_serialized_dag_update_interval = 30
-
-# If True, serialized DAGs are compressed before writing to DB.
-# Note: this will disable the DAG dependencies view
-compress_serialized_dags = False
-
-# Fetching serialized DAG can not be faster than a minimum interval to reduce 
database
-# read rate. This config controls when your DAGs are updated in the Webserver
-min_serialized_dag_fetch_interval = 10
-
-# Maximum number of Rendered Task Instance Fields (Template Fields) per task 
to store
-# in the Database.
-# All the template_fields for each of Task Instance are stored in the Database.
-# Keeping this number small may cause an error when you try to view 
``Rendered`` tab in
-# TaskInstance view for older tasks.
-max_num_rendered_ti_fields_per_task = 30
-
-# On each dagrun check against defined SLAs
-check_slas = True
-
-# Path to custom XCom class that will be used to store and resolve operators 
results
-# Example: xcom_backend = path.to.CustomXCom
-xcom_backend = airflow.models.xcom.BaseXCom
-
-# By default Airflow plugins are lazily-loaded (only loaded when required). 
Set it to ``False``,
-# if you want to load plugins whenever 'airflow' is invoked via cli or loaded 
from module.
-lazy_load_plugins = True
-
-# By default Airflow providers are lazily-discovered (discovery and imports 
happen only when required).
-# Set it to False, if you want to discover providers whenever 'airflow' is 
invoked via cli or
-# loaded from module.
-lazy_discover_providers = True
-
-# Hide sensitive Variables or Connection extra json keys from UI and task logs 
when set to True
+# This file used to have something that was similar to the default Airflow 
configuration but it was
+# really just a template. It was used to generate the final configuration and 
it was confusing
+# if you copied it to your configuration and some of values were wrong.
 #
-# (Connection passwords are always hidden in logs)
-hide_sensitive_var_conn_fields = True
-
-# A comma-separated list of extra sensitive keywords to look for in variables 
names or connection's
-# extra JSON.
-sensitive_var_conn_names =
-
-# Task Slot counts for ``default_pool``. This setting would not have any 
effect in an existing
-# deployment where the ``default_pool`` is already created. For existing 
deployments, users can
-# change the number of slots using Webserver, API or the CLI
-default_pool_task_slot_count = 128
-
-# The maximum list/dict length an XCom can push to trigger task mapping. If 
the pushed list/dict has a
-# length exceeding this value, the task pushing the XCom will be failed 
automatically to prevent the
-# mapped tasks from clogging the scheduler.
-max_map_length = 1024
-
-# The default umask to use for process when run in daemon mode (scheduler, 
worker,  etc.)
+# The first time you run Airflow, it will create a file called ``airflow.cfg`` 
in
+# your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This is in 
order to make it easy to
+# "play" with airflow configuration.
 #
-# This controls the file-creation mode mask which determines the initial value 
of file permission bits
-# for newly created files.
+# However, for production case you are advised to generate the configuration 
using command line:
 #
-# This value is treated as an octal-integer.
-daemon_umask = 0o077
-
-# Class to use as dataset manager.
-# Example: dataset_manager_class = airflow.datasets.manager.DatasetManager
-# dataset_manager_class =
-
-# Kwargs to supply to dataset manager.
-# Example: dataset_manager_kwargs = {{"some_param": "some_value"}}
-# dataset_manager_kwargs =
-
-# (experimental) Whether components should use Airflow Internal API for DB 
connectivity.
-database_access_isolation = False
-
-# (experimental) Airflow Internal API url. Only used if [core] 
database_access_isolation is True.
-# Example: internal_api_url = http://localhost:8080
-# internal_api_url =
-
-# The ability to allow testing connections across Airflow UI, API and CLI.
-# Supported options: Disabled, Enabled, Hidden. Default: Disabled
-# Disabled - Disables the test connection functionality and disables the Test 
Connection button in UI.
-# Enabled - Enables the test connection functionality and shows the Test 
Connection button in UI.
-# Hidden - Disables the test connection functionality and hides the Test 
Connection button in UI.
-# Before setting this to Enabled, make sure that you review the users who are 
able to add/edit
-# connections and ensure they are trusted. Connection testing can be done 
maliciously leading to
-# undesired and insecure outcomes. For more information on capabilities of 
users, see the documentation:
-# 
https://airflow.apache.org/docs/apache-airflow/stable/security/index.html#capabilities-of-authenticated-ui-users
-test_connection = Disabled
-
-[database]
-# Path to the ``alembic.ini`` file. You can either provide the file path 
relative
-# to the Airflow home directory or the absolute path if it is located 
elsewhere.
-alembic_ini_file_path = alembic.ini
-
-# The SqlAlchemy connection string to the metadata database.
-# SqlAlchemy supports many different database engines.
-# More information here:
-# 
http://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#database-uri
-sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
-
-# Extra engine specific keyword args passed to SQLAlchemy's create_engine, as 
a JSON-encoded value
-# Example: sql_alchemy_engine_args = {{"arg1": True}}
-# sql_alchemy_engine_args =
-
-# The encoding for the databases
-sql_engine_encoding = utf-8
-
-# Collation for ``dag_id``, ``task_id``, ``key``, ``external_executor_id`` 
columns
-# in case they have different encoding.
-# By default this collation is the same as the database collation, however for 
``mysql`` and ``mariadb``
-# the default is ``utf8mb3_bin`` so that the index sizes of our index keys 
will not exceed
-# the maximum size of allowed index when collation is set to ``utf8mb4`` 
variant
-# (see https://github.com/apache/airflow/pull/17603#issuecomment-901121618).
-# sql_engine_collation_for_ids =
-
-# If SqlAlchemy should pool database connections.
-sql_alchemy_pool_enabled = True
-
-# The SqlAlchemy pool size is the maximum number of database connections
-# in the pool. 0 indicates no limit.
-sql_alchemy_pool_size = 5
-
-# The maximum overflow size of the pool.
-# When the number of checked-out connections reaches the size set in pool_size,
-# additional connections will be returned up to this limit.
-# When those additional connections are returned to the pool, they are 
disconnected and discarded.
-# It follows then that the total number of simultaneous connections the pool 
will allow
-# is pool_size + max_overflow,
-# and the total number of "sleeping" connections the pool will allow is 
pool_size.
-# max_overflow can be set to ``-1`` to indicate no overflow limit;
-# no limit will be placed on the total number of concurrent connections. 
Defaults to ``10``.
-sql_alchemy_max_overflow = 10
-
-# The SqlAlchemy pool recycle is the number of seconds a connection
-# can be idle in the pool before it is invalidated. This config does
-# not apply to sqlite. If the number of DB connections is ever exceeded,
-# a lower config value will allow the system to recover faster.
-sql_alchemy_pool_recycle = 1800
-
-# Check connection at the start of each connection pool checkout.
-# Typically, this is a simple statement like "SELECT 1".
-# More information here:
-# 
https://docs.sqlalchemy.org/en/14/core/pooling.html#disconnect-handling-pessimistic
-sql_alchemy_pool_pre_ping = True
-
-# The schema to use for the metadata database.
-# SqlAlchemy supports databases with the concept of multiple schemas.
-sql_alchemy_schema =
-
-# Import path for connect args in SqlAlchemy. Defaults to an empty dict.
-# This is useful when you want to configure db engine args that SqlAlchemy 
won't parse
-# in connection string.
-# See 
https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine.params.connect_args
-# Example: sql_alchemy_connect_args = {{"timeout": 30}}
-# sql_alchemy_connect_args =
-
-# Whether to load the default connections that ship with Airflow. It's good to
-# get started, but you probably want to set this to ``False`` in a production
-# environment
-load_default_connections = True
-
-# Number of times the code should be retried in case of DB Operational Errors.
-# Not all transactions will be retried as it can cause undesired state.
-# Currently it is only used in ``DagFileProcessor.process_file`` to retry 
``dagbag.sync_to_db``.
-max_db_retries = 3
-
-# Whether to run alembic migrations during Airflow start up. Sometimes this 
operation can be expensive,
-# and the users can assert the correct version through other means (e.g. 
through a Helm chart).
-# Accepts "True" or "False".
-check_migrations = True
-
-[logging]
-# The folder where airflow should store its log files.
-# This path must be absolute.
-# There are a few existing configurations that assume this is set to the 
default.
-# If you choose to override this you may need to update the 
dag_processor_manager_log_location and
-# dag_processor_manager_log_location settings as well.
-base_log_folder = {AIRFLOW_HOME}/logs
-
-# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic 
Search.
-# Set this to True if you want to enable remote logging.
-remote_logging = False
-
-# Users must supply an Airflow connection id that provides access to the 
storage
-# location. Depending on your remote logging service, this may only be used for
-# reading logs, not writing them.
-remote_log_conn_id =
-
-# Whether the local log files for GCS, S3, WASB and OSS remote logging should 
be deleted after
-# they are uploaded to the remote location.
-delete_local_logs = False
-
-# Path to Google Credential JSON file. If omitted, authorization based on `the 
Application Default
-# Credentials
-# 
<https://cloud.google.com/docs/authentication/production#finding_credentials_automatically>`__
 will
-# be used.
-google_key_path =
-
-# Storage bucket URL for remote logging
-# S3 buckets should start with "s3://"
-# Cloudwatch log groups should start with "cloudwatch://"
-# GCS buckets should start with "gs://"
-# WASB buckets should start with "wasb" just to help Airflow select correct 
handler
-# Stackdriver logs should start with "stackdriver://"
-remote_base_log_folder =
-
-# The remote_task_handler_kwargs param is loaded into a dictionary and passed 
to __init__ of remote
-# task handler and it overrides the values provided by Airflow config. For 
example if you set
-# `delete_local_logs=False` and you provide ``{{"delete_local_copy": true}}``, 
then the local
-# log files will be deleted after they are uploaded to remote location.
-# Example: remote_task_handler_kwargs = {{"delete_local_copy": true}}
-remote_task_handler_kwargs =
-
-# Use server-side encryption for logs stored in S3
-encrypt_s3_logs = False
-
-# Logging level.
+#         airflow config list --defaults
 #
-# Supported values: ``CRITICAL``, ``ERROR``, ``WARNING``, ``INFO``, ``DEBUG``.
-logging_level = INFO
-
-# Logging level for celery. If not set, it uses the value of logging_level
+# This command will produce the output that you can copy to your configuration 
file and edit.
+# It will contain all the default configuration options nicely commented out, 
with examples and
+# all the values will be commented out so you can only un-comment and change 
those that you want to change.

Review Comment:
   Simplified and removed some duplication
   ```suggestion
   # It will contain all the default configuration options, with examples, 
nicely commented out
   # so you need only un-comment and modify those that you want to change.
   ```



##########
docs/apache-airflow/howto/set-config.rst:
##########
@@ -21,8 +21,31 @@ Setting Configuration Options
 =============================
 
 The first time you run Airflow, it will create a file called ``airflow.cfg`` in
-your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This file 
contains Airflow's configuration and you
-can edit it to change any of the settings. You can also set options with 
environment variables by using this format:
+your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This is in order 
to make it easy to
+"play" with airflow configuration.
+
+However, for production case you are advised to generate the configuration 
using command line:
+
+.. code-block:: bash
+
+    airflow config list --defaults
+
+This command will produce the output that you can copy to your configuration 
file and edit.
+
+It will contain all the default configuration options nicely commented out, 
with examples and
+all the values will be commented out so you can only un-comment and change 
those that you want to change.

Review Comment:
   Same as above:
   ```suggestion
    It will contain all the default configuration options, with examples, 
nicely commented out
   so you need only un-comment and modify those that you want to change.
   ```



##########
airflow/config_templates/default_airflow.cfg:
##########
@@ -16,1507 +15,31 @@
 # specific language governing permissions and limitations
 # under the License.
 
-# This is the template for Airflow's default configuration. When Airflow is
-# imported, it looks for a configuration file at $AIRFLOW_HOME/airflow.cfg. If
-# it doesn't exist, Airflow uses this template to generate it by replacing
-# variables in curly braces with their global values from configuration.py.
-
-# Users should not modify this file; they should customize the generated
-# airflow.cfg instead.
-
-
-# ----------------------- TEMPLATE BEGINS HERE -----------------------
-
-[core]
-# The folder where your airflow pipelines live, most likely a
-# subfolder in a code repository. This path must be absolute.
-dags_folder = {AIRFLOW_HOME}/dags
-
-# Hostname by providing a path to a callable, which will resolve the hostname.
-# The format is "package.function".
 #
-# For example, default value "airflow.utils.net.getfqdn" means that result 
from patched
-# version of socket.getfqdn() - see 
https://github.com/python/cpython/issues/49254.
+# NOTE:
 #
-# No argument should be required in the function specified.
-# If using IP address as hostname is preferred, use value 
``airflow.utils.net.get_host_ip_address``
-hostname_callable = airflow.utils.net.getfqdn
-
-# A callable to check if a python file has airflow dags defined or not
-# with argument as: `(file_path: str, zip_file: zipfile.ZipFile | None = None)`
-# return True if it has dags otherwise False
-# If this is not provided, Airflow uses its own heuristic rules.
-might_contain_dag_callable = 
airflow.utils.file.might_contain_dag_via_default_heuristic
-
-# Default timezone in case supplied date times are naive
-# can be utc (default), system, or any IANA timezone string (e.g. 
Europe/Amsterdam)
-default_timezone = utc
-
-# The executor class that airflow should use. Choices include
-# ``SequentialExecutor``, ``LocalExecutor``, ``CeleryExecutor``, 
``DaskExecutor``,
-# ``KubernetesExecutor``, ``CeleryKubernetesExecutor`` or the
-# full import path to the class when using a custom executor.
-executor = SequentialExecutor
-
-# The auth manager class that airflow should use. Full import path to the auth 
manager class.
-auth_manager = airflow.auth.managers.fab.fab_auth_manager.FabAuthManager
-
-# This defines the maximum number of task instances that can run concurrently 
per scheduler in
-# Airflow, regardless of the worker count. Generally this value, multiplied by 
the number of
-# schedulers in your cluster, is the maximum number of task instances with the 
running
-# state in the metadata database.
-parallelism = 32
-
-# The maximum number of task instances allowed to run concurrently in each 
DAG. To calculate
-# the number of tasks that is running concurrently for a DAG, add up the 
number of running
-# tasks for all DAG runs of the DAG. This is configurable at the DAG level 
with ``max_active_tasks``,
-# which is defaulted as ``max_active_tasks_per_dag``.
+# IF YOU ARE LOOKING FOR DEFAULT CONFIGURATION FILE HERE  - LOOK NO MORE. READ 
EXPLANATION BELOW!
 #
-# An example scenario when this would be useful is when you want to stop a new 
dag with an early
-# start date from stealing all the executor slots in a cluster.
-max_active_tasks_per_dag = 16
-
-# Are DAGs paused by default at creation
-dags_are_paused_at_creation = True
-
-# The maximum number of active DAG runs per DAG. The scheduler will not create 
more DAG runs
-# if it reaches the limit. This is configurable at the DAG level with 
``max_active_runs``,
-# which is defaulted as ``max_active_runs_per_dag``.
-max_active_runs_per_dag = 16
-
-# The name of the method used in order to start Python processes via the 
multiprocessing module.
-# This corresponds directly with the options available in the Python docs:
-# 
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.set_start_method.
-# Must be one of the values returned by:
-# 
https://docs.python.org/3/library/multiprocessing.html#multiprocessing.get_all_start_methods.
-# Example: mp_start_method = fork
-# mp_start_method =
-
-# Whether to load the DAG examples that ship with Airflow. It's good to
-# get started, but you probably want to set this to ``False`` in a production
-# environment
-load_examples = True
-
-# Path to the folder containing Airflow plugins
-plugins_folder = {AIRFLOW_HOME}/plugins
-
-# Should tasks be executed via forking of the parent process ("False",
-# the speedier option) or by spawning a new python process ("True" slow,
-# but means plugin changes picked up by tasks straight away)
-execute_tasks_new_python_interpreter = False
-
-# Secret key to save connection passwords in the db
-fernet_key = {FERNET_KEY}
-
-# Whether to disable pickling dags
-donot_pickle = True
-
-# How long before timing out a python file import
-dagbag_import_timeout = 30.0
-
-# Should a traceback be shown in the UI for dagbag import errors,
-# instead of just the exception message
-dagbag_import_error_tracebacks = True
-
-# If tracebacks are shown, how many entries from the traceback should be shown
-dagbag_import_error_traceback_depth = 2
-
-# How long before timing out a DagFileProcessor, which processes a dag file
-dag_file_processor_timeout = 50
-
-# The class to use for running task instances in a subprocess.
-# Choices include StandardTaskRunner, CgroupTaskRunner or the full import path 
to the class
-# when using a custom task runner.
-task_runner = StandardTaskRunner
-
-# If set, tasks without a ``run_as_user`` argument will be run with this user
-# Can be used to de-elevate a sudo user running Airflow when executing tasks
-default_impersonation =
-
-# What security module to use (for example kerberos)
-security =
-
-# Turn unit test mode on (overwrites many configuration options with test
-# values at runtime)
-unit_test_mode = False
-
-# Whether to enable pickling for xcom (note that this is insecure and allows 
for
-# RCE exploits).
-enable_xcom_pickling = False
-
-# What classes can be imported during deserialization. This is a multi line 
value.
-# The individual items will be parsed as regexp. Python built-in classes (like 
dict)
-# are always allowed. Bare "." will be replaced so you can set airflow.* .
-allowed_deserialization_classes = airflow\..*
-
-# When a task is killed forcefully, this is the amount of time in seconds that
-# it has to cleanup after it is sent a SIGTERM, before it is SIGKILLED
-killed_task_cleanup_time = 60
-
-# Whether to override params with dag_run.conf. If you pass some key-value 
pairs
-# through ``airflow dags backfill -c`` or
-# ``airflow dags trigger -c``, the key-value pairs will override the existing 
ones in params.
-dag_run_conf_overrides_params = True
-
-# If enabled, Airflow will only scan files containing both ``DAG`` and 
``airflow`` (case-insensitive).
-dag_discovery_safe_mode = True
-
-# The pattern syntax used in the ".airflowignore" files in the DAG 
directories. Valid values are
-# ``regexp`` or ``glob``.
-dag_ignore_file_syntax = regexp
-
-# The number of retries each task is going to have by default. Can be 
overridden at dag or task level.
-default_task_retries = 0
-
-# The number of seconds each task is going to wait by default between retries. 
Can be overridden at
-# dag or task level.
-default_task_retry_delay = 300
-
-# The maximum delay (in seconds) each task is going to wait by default between 
retries.
-# This is a global setting and cannot be overridden at task or DAG level.
-max_task_retry_delay = 86400
-
-# The weighting method used for the effective total priority weight of the task
-default_task_weight_rule = downstream
-
-# The default task execution_timeout value for the operators. Expected an 
integer value to
-# be passed into timedelta as seconds. If not specified, then the value is 
considered as None,
-# meaning that the operators are never timed out by default.
-default_task_execution_timeout =
-
-# Updating serialized DAG can not be faster than a minimum interval to reduce 
database write rate.
-min_serialized_dag_update_interval = 30
-
-# If True, serialized DAGs are compressed before writing to DB.
-# Note: this will disable the DAG dependencies view
-compress_serialized_dags = False
-
-# Fetching serialized DAG can not be faster than a minimum interval to reduce 
database
-# read rate. This config controls when your DAGs are updated in the Webserver
-min_serialized_dag_fetch_interval = 10
-
-# Maximum number of Rendered Task Instance Fields (Template Fields) per task 
to store
-# in the Database.
-# All the template_fields for each of Task Instance are stored in the Database.
-# Keeping this number small may cause an error when you try to view 
``Rendered`` tab in
-# TaskInstance view for older tasks.
-max_num_rendered_ti_fields_per_task = 30
-
-# On each dagrun check against defined SLAs
-check_slas = True
-
-# Path to custom XCom class that will be used to store and resolve operators 
results
-# Example: xcom_backend = path.to.CustomXCom
-xcom_backend = airflow.models.xcom.BaseXCom
-
-# By default Airflow plugins are lazily-loaded (only loaded when required). 
Set it to ``False``,
-# if you want to load plugins whenever 'airflow' is invoked via cli or loaded 
from module.
-lazy_load_plugins = True
-
-# By default Airflow providers are lazily-discovered (discovery and imports 
happen only when required).
-# Set it to False, if you want to discover providers whenever 'airflow' is 
invoked via cli or
-# loaded from module.
-lazy_discover_providers = True
-
-# Hide sensitive Variables or Connection extra json keys from UI and task logs 
when set to True
+# This file used to have something that was similar to the default Airflow 
configuration but it was
+# really just a template. It was used to generate the final configuration and 
it was confusing
+# if you copied it to your configuration and some of values were wrong.
 #
-# (Connection passwords are always hidden in logs)
-hide_sensitive_var_conn_fields = True
-
-# A comma-separated list of extra sensitive keywords to look for in variables 
names or connection's
-# extra JSON.
-sensitive_var_conn_names =
-
-# Task Slot counts for ``default_pool``. This setting would not have any 
effect in an existing
-# deployment where the ``default_pool`` is already created. For existing 
deployments, users can
-# change the number of slots using Webserver, API or the CLI
-default_pool_task_slot_count = 128
-
-# The maximum list/dict length an XCom can push to trigger task mapping. If 
the pushed list/dict has a
-# length exceeding this value, the task pushing the XCom will be failed 
automatically to prevent the
-# mapped tasks from clogging the scheduler.
-max_map_length = 1024
-
-# The default umask to use for process when run in daemon mode (scheduler, 
worker,  etc.)
+# The first time you run Airflow, it will create a file called ``airflow.cfg`` 
in
+# your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This is in 
order to make it easy to
+# "play" with airflow configuration.
 #
-# This controls the file-creation mode mask which determines the initial value 
of file permission bits
-# for newly created files.
+# However, for production case you are advised to generate the configuration 
using command line:
 #
-# This value is treated as an octal-integer.
-daemon_umask = 0o077
-
-# Class to use as dataset manager.
-# Example: dataset_manager_class = airflow.datasets.manager.DatasetManager
-# dataset_manager_class =
-
-# Kwargs to supply to dataset manager.
-# Example: dataset_manager_kwargs = {{"some_param": "some_value"}}
-# dataset_manager_kwargs =
-
-# (experimental) Whether components should use Airflow Internal API for DB 
connectivity.
-database_access_isolation = False
-
-# (experimental) Airflow Internal API url. Only used if [core] 
database_access_isolation is True.
-# Example: internal_api_url = http://localhost:8080
-# internal_api_url =
-
-# The ability to allow testing connections across Airflow UI, API and CLI.
-# Supported options: Disabled, Enabled, Hidden. Default: Disabled
-# Disabled - Disables the test connection functionality and disables the Test 
Connection button in UI.
-# Enabled - Enables the test connection functionality and shows the Test 
Connection button in UI.
-# Hidden - Disables the test connection functionality and hides the Test 
Connection button in UI.
-# Before setting this to Enabled, make sure that you review the users who are 
able to add/edit
-# connections and ensure they are trusted. Connection testing can be done 
maliciously leading to
-# undesired and insecure outcomes. For more information on capabilities of 
users, see the documentation:
-# 
https://airflow.apache.org/docs/apache-airflow/stable/security/index.html#capabilities-of-authenticated-ui-users
-test_connection = Disabled
-
-[database]
-# Path to the ``alembic.ini`` file. You can either provide the file path 
relative
-# to the Airflow home directory or the absolute path if it is located 
elsewhere.
-alembic_ini_file_path = alembic.ini
-
-# The SqlAlchemy connection string to the metadata database.
-# SqlAlchemy supports many different database engines.
-# More information here:
-# 
http://airflow.apache.org/docs/apache-airflow/stable/howto/set-up-database.html#database-uri
-sql_alchemy_conn = sqlite:///{AIRFLOW_HOME}/airflow.db
-
-# Extra engine specific keyword args passed to SQLAlchemy's create_engine, as 
a JSON-encoded value
-# Example: sql_alchemy_engine_args = {{"arg1": True}}
-# sql_alchemy_engine_args =
-
-# The encoding for the databases
-sql_engine_encoding = utf-8
-
-# Collation for ``dag_id``, ``task_id``, ``key``, ``external_executor_id`` 
columns
-# in case they have different encoding.
-# By default this collation is the same as the database collation, however for 
``mysql`` and ``mariadb``
-# the default is ``utf8mb3_bin`` so that the index sizes of our index keys 
will not exceed
-# the maximum size of allowed index when collation is set to ``utf8mb4`` 
variant
-# (see https://github.com/apache/airflow/pull/17603#issuecomment-901121618).
-# sql_engine_collation_for_ids =
-
-# If SqlAlchemy should pool database connections.
-sql_alchemy_pool_enabled = True
-
-# The SqlAlchemy pool size is the maximum number of database connections
-# in the pool. 0 indicates no limit.
-sql_alchemy_pool_size = 5
-
-# The maximum overflow size of the pool.
-# When the number of checked-out connections reaches the size set in pool_size,
-# additional connections will be returned up to this limit.
-# When those additional connections are returned to the pool, they are 
disconnected and discarded.
-# It follows then that the total number of simultaneous connections the pool 
will allow
-# is pool_size + max_overflow,
-# and the total number of "sleeping" connections the pool will allow is 
pool_size.
-# max_overflow can be set to ``-1`` to indicate no overflow limit;
-# no limit will be placed on the total number of concurrent connections. 
Defaults to ``10``.
-sql_alchemy_max_overflow = 10
-
-# The SqlAlchemy pool recycle is the number of seconds a connection
-# can be idle in the pool before it is invalidated. This config does
-# not apply to sqlite. If the number of DB connections is ever exceeded,
-# a lower config value will allow the system to recover faster.
-sql_alchemy_pool_recycle = 1800
-
-# Check connection at the start of each connection pool checkout.
-# Typically, this is a simple statement like "SELECT 1".
-# More information here:
-# 
https://docs.sqlalchemy.org/en/14/core/pooling.html#disconnect-handling-pessimistic
-sql_alchemy_pool_pre_ping = True
-
-# The schema to use for the metadata database.
-# SqlAlchemy supports databases with the concept of multiple schemas.
-sql_alchemy_schema =
-
-# Import path for connect args in SqlAlchemy. Defaults to an empty dict.
-# This is useful when you want to configure db engine args that SqlAlchemy 
won't parse
-# in connection string.
-# See 
https://docs.sqlalchemy.org/en/14/core/engines.html#sqlalchemy.create_engine.params.connect_args
-# Example: sql_alchemy_connect_args = {{"timeout": 30}}
-# sql_alchemy_connect_args =
-
-# Whether to load the default connections that ship with Airflow. It's good to
-# get started, but you probably want to set this to ``False`` in a production
-# environment
-load_default_connections = True
-
-# Number of times the code should be retried in case of DB Operational Errors.
-# Not all transactions will be retried as it can cause undesired state.
-# Currently it is only used in ``DagFileProcessor.process_file`` to retry 
``dagbag.sync_to_db``.
-max_db_retries = 3
-
-# Whether to run alembic migrations during Airflow start up. Sometimes this 
operation can be expensive,
-# and the users can assert the correct version through other means (e.g. 
through a Helm chart).
-# Accepts "True" or "False".
-check_migrations = True
-
-[logging]
-# The folder where airflow should store its log files.
-# This path must be absolute.
-# There are a few existing configurations that assume this is set to the 
default.
-# If you choose to override this you may need to update the 
dag_processor_manager_log_location and
-# dag_processor_manager_log_location settings as well.
-base_log_folder = {AIRFLOW_HOME}/logs
-
-# Airflow can store logs remotely in AWS S3, Google Cloud Storage or Elastic 
Search.
-# Set this to True if you want to enable remote logging.
-remote_logging = False
-
-# Users must supply an Airflow connection id that provides access to the 
storage
-# location. Depending on your remote logging service, this may only be used for
-# reading logs, not writing them.
-remote_log_conn_id =
-
-# Whether the local log files for GCS, S3, WASB and OSS remote logging should 
be deleted after
-# they are uploaded to the remote location.
-delete_local_logs = False
-
-# Path to Google Credential JSON file. If omitted, authorization based on `the 
Application Default
-# Credentials
-# 
<https://cloud.google.com/docs/authentication/production#finding_credentials_automatically>`__
 will
-# be used.
-google_key_path =
-
-# Storage bucket URL for remote logging
-# S3 buckets should start with "s3://"
-# Cloudwatch log groups should start with "cloudwatch://"
-# GCS buckets should start with "gs://"
-# WASB buckets should start with "wasb" just to help Airflow select correct 
handler
-# Stackdriver logs should start with "stackdriver://"
-remote_base_log_folder =
-
-# The remote_task_handler_kwargs param is loaded into a dictionary and passed 
to __init__ of remote
-# task handler and it overrides the values provided by Airflow config. For 
example if you set
-# `delete_local_logs=False` and you provide ``{{"delete_local_copy": true}}``, 
then the local
-# log files will be deleted after they are uploaded to remote location.
-# Example: remote_task_handler_kwargs = {{"delete_local_copy": true}}
-remote_task_handler_kwargs =
-
-# Use server-side encryption for logs stored in S3
-encrypt_s3_logs = False
-
-# Logging level.
+#         airflow config list --defaults
 #
-# Supported values: ``CRITICAL``, ``ERROR``, ``WARNING``, ``INFO``, ``DEBUG``.
-logging_level = INFO
-
-# Logging level for celery. If not set, it uses the value of logging_level
+# This command will produce the output that you can copy to your configuration 
file and edit.
+# It will contain all the default configuration options nicely commented out, 
with examples and
+# all the values will be commented out so you can only un-comment and change 
those that you want to change.
+# This way you can easily keep track of all the configuration options that you 
changed from default
+# and you can also easily upgrade your installation to new versions of Airflow 
when they come out and
+# Automatically use the defaults for existing options if they changed there.

Review Comment:
   Small typo fix. Also this is **such an amazing sideeffect of this 
approach!** :rocket: 
   ```suggestion
   # automatically use the defaults for existing options if they changed there.
   ```



##########
docs/apache-airflow/howto/set-config.rst:
##########
@@ -21,8 +21,31 @@ Setting Configuration Options
 =============================
 
 The first time you run Airflow, it will create a file called ``airflow.cfg`` in
-your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This file 
contains Airflow's configuration and you
-can edit it to change any of the settings. You can also set options with 
environment variables by using this format:
+your ``$AIRFLOW_HOME`` directory (``~/airflow`` by default). This is in order 
to make it easy to
+"play" with airflow configuration.
+
+However, for production case you are advised to generate the configuration 
using command line:
+
+.. code-block:: bash
+
+    airflow config list --defaults
+
+This command will produce the output that you can copy to your configuration 
file and edit.
+
+It will contain all the default configuration options nicely commented out, 
with examples and
+all the values will be commented out so you can only un-comment and change 
those that you want to change.
+This way you can easily keep track of all the configuration options that you 
changed from default
+and you can also easily upgrade your installation to new versions of Airflow 
when they come out and
+Automatically use the defaults for existing options if they changed there.

Review Comment:
   Same as above:
   ```suggestion
   automatically use the defaults for existing options if they changed there.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] o-nikolas commented on a diff in pull request #32669: Store config description in Airflow configuration object

Reply via email to