Hila Visan commented on AIRFLOW-366:

any update?

> SchedulerJob gets locked up when when child processes attempt to log to 
> single file
> -----------------------------------------------------------------------------------
>                 Key: AIRFLOW-366
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-366
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Greg Neiheisel
>            Assignee: Bolke de Bruin
> After running the scheduler for a while (usually after 1 - 5 hours) it will 
> eventually lock up, and nothing will get scheduled.
> A `SchedulerJob` will end up getting stuck in the `while` loop around line 
> 730 of `airflow/jobs.py`.
> From what I can tell this is related to logging from within a forked process 
> using pythons multiprocessing module.
> The job will fork off some child processes to process the DAGs but one (or 
> more) will end up getting suck and not terminating, resulting in the while 
> loop getting hung up.  You can `kill -9 PID` the child process manually, and 
> the loop will end and the scheduler will go on it's way, until it happens 
> again.
> The issue is due to usage of the logging module from within the child 
> processes.  From what I can tell, logging to a file from multiple processes 
> is not supported by the multiprocessing module, but it is supported using 
> python multithreading, using some sort of locking mechanism.
> I think a child process will somehow inherit a logger that is locked, right 
> when it is forked, resulting it the process completely locking up.
> I went in and commented out all the logging statements that could possibly be 
> hit by the child process (jobs.py, models.py), and was able to keep the 
> scheduler alive.

This message was sent by Atlassian JIRA

Reply via email to