[ 
https://issues.apache.org/jira/browse/AIRFLOW-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15514258#comment-15514258
 ] 

Bolke de Bruin commented on AIRFLOW-366:
----------------------------------------

Another way to workaround this is to use SysLog instead. We probably wont fix 
this issue in 1.7.1.3 as in master this has changed considerably.

To use syslog instead of file logging:

def setup_logging(filename):
    from logging.handlers import SysLogHandler

    root = logging.getLogger()
    #handler = logging.FileHandler(filename)
    logging = SysLogHandler(address='/dev/log')
    formatter = logging.Formatter(settings.SIMPLE_LOG_FORMAT)
    handler.setFormatter(formatter)
    root.addHandler(handler)
    root.setLevel(settings.LOGGING_LEVEL)

    return handler.stream


You might need to replace "address='/dev/log'" to "address=('localhost',514)" 
as /dev/log is still a file descriptor.

> SchedulerJob gets locked up when when child processes attempt to log to 
> single file
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-366
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-366
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Greg Neiheisel
>            Assignee: Bolke de Bruin
>
> After running the scheduler for a while (usually after 1 - 5 hours) it will 
> eventually lock up, and nothing will get scheduled.
> A `SchedulerJob` will end up getting stuck in the `while` loop around line 
> 730 of `airflow/jobs.py`.
> From what I can tell this is related to logging from within a forked process 
> using pythons multiprocessing module.
> The job will fork off some child processes to process the DAGs but one (or 
> more) will end up getting suck and not terminating, resulting in the while 
> loop getting hung up.  You can `kill -9 PID` the child process manually, and 
> the loop will end and the scheduler will go on it's way, until it happens 
> again.
> The issue is due to usage of the logging module from within the child 
> processes.  From what I can tell, logging to a file from multiple processes 
> is not supported by the multiprocessing module, but it is supported using 
> python multithreading, using some sort of locking mechanism.
> I think a child process will somehow inherit a logger that is locked, right 
> when it is forked, resulting it the process completely locking up.
> I went in and commented out all the logging statements that could possibly be 
> hit by the child process (jobs.py, models.py), and was able to keep the 
> scheduler alive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to