[
https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16089802#comment-16089802
]
Rick Otten commented on AIRFLOW-401:
------------------------------------
The scheduler restarts itself routinely. I have no idea why unless it is to
clear these stale child processes. At the moment, even though I never run more
than 10 tasks at a time, I've bumped parallelism up to 128. I'm still
consuming all 128 while the database backup task is running, but it takes a
while to use them all up. It seems like a poor work-around. The real issue is
that whatever is handling the "child exit" codes is not catching that child
processes are done.
> scheduler gets stuck without a trace
> ------------------------------------
>
> Key: AIRFLOW-401
> URL: https://issues.apache.org/jira/browse/AIRFLOW-401
> Project: Apache Airflow
> Issue Type: Bug
> Components: executor, scheduler
> Affects Versions: Airflow 1.7.1.3
> Reporter: Nadeem Ahmed Nazeer
> Assignee: Bolke de Bruin
> Priority: Minor
> Labels: celery, kombu
> Attachments: Dag_code.txt, schduler_cpu100%.png,
> scheduler_stuck_7hours.png, scheduler_stuck.png
>
>
> The scheduler gets stuck without a trace or error. When this happens, the CPU
> usage of scheduler service is at 100%. No jobs get submitted and everything
> comes to a halt. Looks it goes into some kind of infinite loop.
> The only way I could make it run again is by manually restarting the
> scheduler service. But again, after running some tasks it gets stuck. I've
> tried with both Celery and Local executors but same issue occurs. I am using
> the -n 3 parameter while starting scheduler.
> Scheduler configs,
> job_heartbeat_sec = 5
> scheduler_heartbeat_sec = 5
> executor = LocalExecutor
> parallelism = 32
> Please help. I would be happy to provide any other information needed
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)