[ https://issues.apache.org/jira/browse/AIRFLOW-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16353579#comment-16353579 ]
Sébastien Brochet edited comment on AIRFLOW-2065 at 2/6/18 8:54 AM: -------------------------------------------------------------------- Fixing in our production instance the above issue, we noticed another exception later in the code, around here: [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/file_processor_handler.py#L105-L108] {noformat} Traceback (most recent call last): File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", line 558, in configure handler = self.configure_handler(handlers[name]) File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", line 731, in configure_handler result = factory(**kwargs) File "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 50, in __init__ self._symlink_latest_log_directory() File "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 106, in _symlink_latest_log_directory os.unlink(latest_log_directory_path) FileNotFoundError: [Errno 2] No such file or directory: '/projects/airflow-hadoop/airflow/logs/scheduler/latest' {noformat} Same issue here, two processes are trying to delete the same directory. was (Author: blinkseb): Fixing in our production instance the above issue, we noticed another exception later in the code, around here: [https://github.com/apache/incubator-airflow/blob/master/airflow/utils/log/file_processor_handler.py#L105-L108] ``` Traceback (most recent call last): File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", line 558, in configure handler = self.configure_handler(handlers[name]) File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", line 731, in configure_handler result = factory(**kwargs) File "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 50, in __init__ self._symlink_latest_log_directory() File "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", line 106, in _symlink_latest_log_directory os.unlink(latest_log_directory_path) FileNotFoundError: [Errno 2] No such file or directory: '/projects/airflow-hadoop/airflow/logs/scheduler/latest' ``` Same issue here, two processes are trying to delete the same directory. > Worker logging can raise FileExistsError when more than one process execute > concurrently > ---------------------------------------------------------------------------------------- > > Key: AIRFLOW-2065 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2065 > Project: Apache Airflow > Issue Type: Bug > Components: executor, logging > Affects Versions: 1.9.0 > Reporter: Sébastien Brochet > Priority: Critical > > Hello, > > We started observing random failing during the execution of our dags after > upgrading to 1.9.0. After careful debugging, we noticing the following > exception in the worker logs: > {noformat} > Traceback (most recent call last): > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 558, in configure > handler = self.configure_handler(handlers[name]) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 731, in configure_handler > result = factory(**kwargs) > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py", > line 48, in __init__ > os.makedirs(self._get_log_directory()) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/os.py", line 220, > in makedirs > mkdir(name, mode) > FileExistsError: [Errno 17] File exists: > '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05' > During handling of the above exception, another exception occurred: > Traceback (most recent call last): > File "/projects/airflow-hadoop/anaconda3/bin/airflow", line 16, in <module> > from airflow import configuration > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/__init__.py", > line 31, in <module> > from airflow import settings > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/settings.py", > line 148, in <module> > configure_logging() > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py", > line 75, in configure_logging > raise e > File > "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py", > line 70, in configure_logging > dictConfig(logging_config) > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 795, in dictConfig > dictConfigClass(config).configure() > File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", > line 566, in configure > '%r: %s' % (name, e)) > ValueError: Unable to configure handler 'file.processor': [Errno 17] File > exists: '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05 > {noformat} > > As you can see, an exception is raised when trying to create the directory > where to store the executor logs. This can happen if two tasks are scheduled > are the exact same time on the same worker. It appears to be the case here : > > {noformat} > [2018-02-05 02:10:07,886] \{celery_executor.py:50} INFO - Executing command > in Celery: airflow run xxxx pairing_sensor_check 2018-02-04T02:10:00 --local > --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py > [2018-02-05 02:10:07,908] \{celery_executor.py:50} INFO - Executing command > in Celery: airflow run yyy pairing_sensor_check 2018-02-04T02:10:00 --local > --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py > {noformat} > Culprits is here: > [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/log/file_processor_handler.py#L47-L48] > (not fixed in master) > A simple fix would be to wrap the {{makedirs}} command into a {{try}} / > {{catch}} block. > > Thanks, > > Sébastien -- This message was sent by Atlassian JIRA (v7.6.3#76005)