mik-laj commented on a change in pull request #6644: [AIRFLOW-6047] Simplify
the logging configuration template
URL: https://github.com/apache/airflow/pull/6644#discussion_r350099913
##########
File path: airflow/config_templates/airflow_local_settings.py
##########
@@ -148,26 +128,61 @@
}
}
-REMOTE_HANDLERS = {
- 's3': {
+# Only update the handlers and loggers when CONFIG_PROCESSOR_MANAGER_LOGGER is
set.
+# This is to avoid exceptions when initializing RotatingFileHandler multiple
times
+# in multiple processes.
+if os.environ.get('CONFIG_PROCESSOR_MANAGER_LOGGER') == 'True':
+ DEFAULT_LOGGING_CONFIG['handlers'] \
+ .update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'])
+ DEFAULT_LOGGING_CONFIG['loggers'] \
+ .update(DEFAULT_DAG_PARSING_LOGGING_CONFIG['loggers'])
+
+ # Manually create log directory for processor_manager handler as
RotatingFileHandler
+ # will only create file but not the directory.
+ processor_manager_handler_config =
DEFAULT_DAG_PARSING_LOGGING_CONFIG['handlers'][
+ 'processor_manager']
+ directory = os.path.dirname(processor_manager_handler_config['filename'])
+ mkdirs(directory, 0o755)
+
+# Remote logging configuration
+
+# Storage bucket URL for remote logging
+# S3 buckets should start with "s3://"
+# GCS buckets should start with "gs://"
+# WASB buckets should start with "wasb"
+# just to help Airflow select correct handler
+REMOTE_BASE_LOG_FOLDER = conf.get('core', 'REMOTE_BASE_LOG_FOLDER')
+
+ELASTICSEARCH_HOST = conf.get('elasticsearch', 'HOST')
+
+REMOTE_LOGGING = conf.getboolean('core', 'remote_logging')
+
+if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
+ S3_REMOTE_HANDLERS = {
'task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow',
Review comment:
This is not common to all handlers, so it will be problematic. My
Stackdriver handler contains the following configurations:
https://github.com/PolideaInternal/airflow/blob/e2511a74bfdd3824845ae037e4a50de127c223d6/airflow/config_templates/airflow_local_settings.py
```python
gcp_conn_id = conf.get('core', 'REMOTE_LOG_CONN_ID', fallback=None)
# stackdriver:///airflow-tasks => airflow-tasks
REMOTE_BASE_LOG_FOLDER = urlparse(REMOTE_BASE_LOG_FOLDER).path[1:]
STACKDRIVER_REMOTE_HANDLERS = {
'task': {
'class':
'airflow.utils.log.stackdriver_task_handler.StackdriverTaskHandler',
'formatter': 'airflow',
'name': REMOTE_BASE_LOG_FOLDER,
'gcp_conn_id': gcp_conn_id
}
}
DEFAULT_LOGGING_CONFIG['handlers'].update(STACKDRIVER_REMOTE_HANDLERS)
```
I'm also afraid that pulling out only part of the configuration to a
separate variable will make it difficult to understand. This is not a classic
code that must follow DRY rules to avoid problems. This is a configuration file
where each code has a different purpose. They look similar, but each has its
own separate role. First of all, this file should be easy to understand and
adapt to the specific case of our users .
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services