James Meickle created AIRFLOW-3449:
--------------------------------------
Summary: Airflow DAG parsing logs aren't written when using S3
logging
Key: AIRFLOW-3449
URL: https://issues.apache.org/jira/browse/AIRFLOW-3449
Project: Apache Airflow
Issue Type: Bug
Components: logging, scheduler
Affects Versions: 1.10.1, 1.10.0
Reporter: James Meickle
The default Airflow logging class outputs provides some logs to stdout, some to
"task" folders, and some to "processor" folders (generated during DAG parsing).
The 1.10.0 logging update broke this, but only for users who are also using S3
logging. This is because of this feature in the default logging config file:
{code:python}
if REMOTE_LOGGING and REMOTE_BASE_LOG_FOLDER.startswith('s3://'):
DEFAULT_LOGGING_CONFIG['handlers'].update(REMOTE_HANDLERS['s3'])
{code}
That replaces this functioning handlers block:
{code:python}
'task': {
'class': 'airflow.utils.log.file_task_handler.FileTaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
'filename_template': FILENAME_TEMPLATE,
},
'processor': {
'class':
'airflow.utils.log.file_processor_handler.FileProcessorHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
},
{code}
With this non-functioning block:
{code:python}
'task': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(BASE_LOG_FOLDER),
's3_log_folder': REMOTE_BASE_LOG_FOLDER,
'filename_template': FILENAME_TEMPLATE,
},
'processor': {
'class': 'airflow.utils.log.s3_task_handler.S3TaskHandler',
'formatter': 'airflow',
'base_log_folder': os.path.expanduser(PROCESSOR_LOG_FOLDER),
's3_log_folder': REMOTE_BASE_LOG_FOLDER,
'filename_template': PROCESSOR_FILENAME_TEMPLATE,
},
{code}
The key issue here is that both "task" and "processor" are being given a
"S3TaskHandler" class to use for logging. But that is not a generic S3 class;
it's actually a subclass of FileTaskHandler!
https://github.com/apache/incubator-airflow/blob/1.10.1/airflow/utils/log/s3_task_handler.py#L26
Since the template vars don't match the template string, the path to log to
evaluates to garbage. The handler then silently fails to log anything at all.
It is likely that anyone using a default-like logging config, plus the remote
S3 logging feature, stopped getting DAG parsing logs (either locally *or* in
S3) as of 1.10.0
Commenting out the DAG parsing section of the S3 block fixed this on my
instance.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)