[ 
https://issues.apache.org/jira/browse/AIRFLOW-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16383497#comment-16383497
 ] 

ASF subversion and git services commented on AIRFLOW-2065:
----------------------------------------------------------

Commit b1deb3318f3fae4e21860bbd6e6463ef644aea8d in incubator-airflow's branch 
refs/heads/master from [~blinkseb]
[ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b1deb33 ]

[AIRFLOW-2065] Fix race-conditions when creating loggers

If two or more workers start at the same time,
they will execute
the same operations to create output directories
for storing the log
files. It can lead to race-conditions when, for
example,  worker A
create the directory right after the non-existance
check done by
worker B; worker B will also try to create the
directory while it does
already exist.

Closes #3040 from blinkseb/fix-airflow-2065


> Worker logging can raise FileExistsError when more than one process execute 
> concurrently
> ----------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2065
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2065
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor, logging
>    Affects Versions: 1.9.0
>            Reporter: Sébastien Brochet
>            Priority: Critical
>             Fix For: 2.0.0
>
>
> Hello,
>  
> We started observing random failing during the execution of our dags after 
> upgrading to 1.9.0. After careful debugging, we noticing the following 
> exception in the worker logs:
> {noformat}
> Traceback (most recent call last):
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 558, in configure
>      handler = self.configure_handler(handlers[name])
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 731, in configure_handler
>      result = factory(**kwargs)
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/utils/log/file_processor_handler.py",
>  line 48, in __init__
>      os.makedirs(self._get_log_directory())
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/os.py", line 220, 
> in makedirs
>      mkdir(name, mode)
>  FileExistsError: [Errno 17] File exists: 
> '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05'
> During handling of the above exception, another exception occurred:
> Traceback (most recent call last):
>    File "/projects/airflow-hadoop/anaconda3/bin/airflow", line 16, in <module>
>      from airflow import configuration
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/__init__.py",
>  line 31, in <module>
>      from airflow import settings
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/settings.py",
>  line 148, in <module>
>      configure_logging()
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py",
>  line 75, in configure_logging
>      raise e
>    File 
> "/projects/airflow-hadoop/anaconda3/lib/python3.6/site-packages/airflow/logging_config.py",
>  line 70, in configure_logging
>      dictConfig(logging_config)
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 795, in dictConfig
>      dictConfigClass(config).configure()
>    File "/projects/airflow-hadoop/anaconda3/lib/python3.6/logging/config.py", 
> line 566, in configure
>      '%r: %s' % (name, e))
>  ValueError: Unable to configure handler 'file.processor': [Errno 17] File 
> exists: '/projects/airflow-hadoop/airflow/logs/scheduler/2018-02-05
> {noformat}
>  
> As you can see, an exception is raised when trying to create the directory 
> where to store the executor logs. This can happen if two tasks are scheduled 
> are the exact same time on the same worker. It appears to be the case here :
>  
> {noformat}
>  [2018-02-05 02:10:07,886] \{celery_executor.py:50} INFO - Executing command 
> in Celery: airflow run xxxx pairing_sensor_check 2018-02-04T02:10:00 --local 
> --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py
>  [2018-02-05 02:10:07,908] \{celery_executor.py:50} INFO - Executing command 
> in Celery: airflow run yyy pairing_sensor_check 2018-02-04T02:10:00 --local 
> --pool sensor -sd /projects/airflow-hadoop/airflow/dags/flow.py
> {noformat}
> Culprits is here: 
> [https://github.com/apache/incubator-airflow/blob/v1-9-stable/airflow/utils/log/file_processor_handler.py#L47-L48]
>  (not fixed in master)
> A simple fix would be to wrap the {{makedirs}} command into a {{try}} / 
> {{catch}} block.
>  
> Thanks,
>  
> Sébastien



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to