[ 
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15733251#comment-15733251
 ] 

Bolke de Bruin commented on AIRFLOW-401:
----------------------------------------

It would be nice to know what versions you are using of rabbitmq/celery 
(including their python lib versions). I wouldnt recommend the cpu time limit 
as at the moment scheduler time increases with the amount of tasks in a DAG. 

The interesting part is obviously the children being stuck in waiting for data 
from rabbitmq. It would be great to have more logging from that. What also 
might help is to lower your TCP/IP timeout settings

> scheduler gets stuck without a trace
> ------------------------------------
>
>                 Key: AIRFLOW-401
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-401
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor, scheduler
>    Affects Versions: Airflow 1.7.1.3
>            Reporter: Nadeem Ahmed Nazeer
>            Assignee: Bolke de Bruin
>            Priority: Minor
>         Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, 
> scheduler_stuck_7hours.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU 
> usage of scheduler service is at 100%. No jobs get submitted and everything 
> comes to a halt. Looks it goes into some kind of infinite loop. 
> The only way I could make it run again is by manually restarting the 
> scheduler service. But again, after running some tasks it gets stuck. I've 
> tried with both Celery and Local executors but same issue occurs. I am using 
> the -n 3 parameter while starting scheduler. 
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to