On Wed, Sep 7, 2016 at 12:17 PM, Bolke de Bruin <[email protected]> wrote: > Ah this is the more interesting case. Are you getting tasks into SCHEDULED > and then the scheduler itself gets stuck? Or do the workers not execute > anything anymore?
The tasks are put into the SCHEDULED state but they don't make it to a worker. This isn't deterministic. With our patch to clean up orphans, a task may flap in SCHEDULED a few times but eventually it makes it to a worker. The scheduler and workers are otherwise running fine. We've been running with the same celery/redis setup for a year. > How do you run your scheduler? With num_runs? We don't use num_runs. We restart the scheduler when we deploy new code. > A later patch checks for these “orphaned_tasks” at scheduler start up. We check for the orphans at the top of the scheduler loop, so on every run. > In other words can you provide some more information :-). > > Bolke > >> Op 7 sep. 2016, om 20:08 heeft Jeff Balogh <[email protected]> het >> volgende geschreven: >> >> Ah yep, we're on >> https://github.com/apache/incubator-airflow/commits/54b361d2a. >> >> On Wed, Sep 7, 2016 at 10:13 AM, Bolke de Bruin <[email protected]> wrote: >>> Hi Jeff, >>> >>> That is kind of impossible for 1.7.1.3 as the SCHEDULED state was >>> introduced after release. Are you sure you are on 1.7.1.3 and not on master? >>> >>> Bolke >>> >>>> Op 7 sep. 2016, om 18:37 heeft Jeff Balogh <[email protected]> >>>> het volgende geschreven: >>>> >>>> When we bumped to 1.7.1.3 we found that tasks would go into the new >>>> SCHEDULED state and get stuck there. We haven't determined why this >>>> happens. >>>> >>>> We put a hacky patch into our scheduler that sets state to None for >>>> any tasks that are SCHEDULED at the beginning of the schedule loop. >>>> >>>> Name: airflow >>>> Version: 1.7.1.3 >>>> Name: celery >>>> Version: 3.1.23 >>>> Name: kombu >>>> Version: 3.0.35 >>>> >>>> redis_version:2.6.13 >>>> >>>> On Sun, Sep 4, 2016 at 6:34 AM, Bolke de Bruin <[email protected]> wrote: >>>>> Hi All, >>>>> >>>>> We have had some reports on this list and sometimes on Jira that the >>>>> scheduler sometimes seems to get stuck. I would like to track down this >>>>> issue, but until now much of the reporting has been a bit light on the >>>>> details. >>>>> >>>>> First and foremost I am assuming that getting “stuck” is only happening >>>>> when using a CeleryExecutor. To further track down the issue I would like >>>>> to know the following >>>>> >>>>> - Airflow version (pip show airflow) >>>>> - Celery version (pip show celery) >>>>> - Kombu version (pip show kombu) >>>>> >>>>> - Redis version (if applicable) >>>>> - RabbitMQ version (if applicable) >>>>> >>>>> - Sanitized airflow configuration >>>>> - Sanitized broker configuration >>>>> >>>>> If possible supply, preferably debug, logs of broker, scheduler and >>>>> worker. >>>>> >>>>> Thanks! >>>>> Bolke >>>>> >>> >
