[jira] [Commented] (AIRFLOW-366) SchedulerJob gets locked up when when child processes attempt to log to single file

Mike Perry (JIRA) Sun, 02 Jul 2017 11:25:40 -0700

    [ 
https://issues.apache.org/jira/browse/AIRFLOW-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16071748#comment-16071748
 ]


Mike Perry commented on AIRFLOW-366:
------------------------------------

We're still seeing this in airflow 1.8.1. Any other thoughts on a possible 
workaround? We've tried removing all log statements from jobs.py and models.py, 
and  replaced setup_logging per [~bolke]'s syslog suggestion above. 

> SchedulerJob gets locked up when when child processes attempt to log to 
> single file
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-366
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-366
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Greg Neiheisel
>            Assignee: Bolke de Bruin
>
> After running the scheduler for a while (usually after 1 - 5 hours) it will 
> eventually lock up, and nothing will get scheduled.
> A `SchedulerJob` will end up getting stuck in the `while` loop around line 
> 730 of `airflow/jobs.py`.
> From what I can tell this is related to logging from within a forked process 
> using pythons multiprocessing module.
> The job will fork off some child processes to process the DAGs but one (or 
> more) will end up getting suck and not terminating, resulting in the while 
> loop getting hung up.  You can `kill -9 PID` the child process manually, and 
> the loop will end and the scheduler will go on it's way, until it happens 
> again.
> The issue is due to usage of the logging module from within the child 
> processes.  From what I can tell, logging to a file from multiple processes 
> is not supported by the multiprocessing module, but it is supported using 
> python multithreading, using some sort of locking mechanism.
> I think a child process will somehow inherit a logger that is locked, right 
> when it is forked, resulting it the process completely locking up.
> I went in and commented out all the logging statements that could possibly be 
> hit by the child process (jobs.py, models.py), and was able to keep the 
> scheduler alive.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (AIRFLOW-366) SchedulerJob gets locked up when when child processes attempt to log to single file

Reply via email to