> Op 7 sep. 2016, om 21:37 heeft Jeff Balogh <[email protected]> het > volgende geschreven: > > On Wed, Sep 7, 2016 at 12:17 PM, Bolke de Bruin <[email protected]> wrote: >> Ah this is the more interesting case. Are you getting tasks into SCHEDULED >> and then the scheduler itself gets stuck? Or do the workers not execute >> anything anymore? > > The tasks are put into the SCHEDULED state but they don't make it to a > worker. This isn't deterministic. With our patch to clean up orphans, > a task may flap in SCHEDULED a few times but eventually it makes it to > a worker.
Ok. So are you implying that the executor is not picking up the tasks or that the queue is losing tasks? Are you able to find out what redis is doing when a ’scheduled’ task is flapping, ie does it receive the task at all? Btw what happened before having the scheduled state in? > > The scheduler and workers are otherwise running fine. We've been > running with the same celery/redis setup for a year. > >> How do you run your scheduler? With num_runs? > > We don't use num_runs. We restart the scheduler when we deploy new code. > >> A later patch checks for these “orphaned_tasks” at scheduler start up. > > We check for the orphans at the top of the scheduler loop, so on every run. Ok we moved away from this for performance reasons. Depending on a solution for the above issue we might need to apply it to every run then. > >> In other words can you provide some more information :-). >> >> Bolke >> >>> Op 7 sep. 2016, om 20:08 heeft Jeff Balogh <[email protected]> het >>> volgende geschreven: >>> >>> Ah yep, we're on >>> https://github.com/apache/incubator-airflow/commits/54b361d2a. >>> >>> On Wed, Sep 7, 2016 at 10:13 AM, Bolke de Bruin <[email protected]> wrote: >>>> Hi Jeff, >>>> >>>> That is kind of impossible for 1.7.1.3 as the SCHEDULED state was >>>> introduced after release. Are you sure you are on 1.7.1.3 and not on >>>> master? >>>> >>>> Bolke >>>> >>>>> Op 7 sep. 2016, om 18:37 heeft Jeff Balogh <[email protected]> >>>>> het volgende geschreven: >>>>> >>>>> When we bumped to 1.7.1.3 we found that tasks would go into the new >>>>> SCHEDULED state and get stuck there. We haven't determined why this >>>>> happens. >>>>> >>>>> We put a hacky patch into our scheduler that sets state to None for >>>>> any tasks that are SCHEDULED at the beginning of the schedule loop. >>>>> >>>>> Name: airflow >>>>> Version: 1.7.1.3 >>>>> Name: celery >>>>> Version: 3.1.23 >>>>> Name: kombu >>>>> Version: 3.0.35 >>>>> >>>>> redis_version:2.6.13 >>>>> >>>>> On Sun, Sep 4, 2016 at 6:34 AM, Bolke de Bruin <[email protected]> wrote: >>>>>> Hi All, >>>>>> >>>>>> We have had some reports on this list and sometimes on Jira that the >>>>>> scheduler sometimes seems to get stuck. I would like to track down this >>>>>> issue, but until now much of the reporting has been a bit light on the >>>>>> details. >>>>>> >>>>>> First and foremost I am assuming that getting “stuck” is only happening >>>>>> when using a CeleryExecutor. To further track down the issue I would >>>>>> like to know the following >>>>>> >>>>>> - Airflow version (pip show airflow) >>>>>> - Celery version (pip show celery) >>>>>> - Kombu version (pip show kombu) >>>>>> >>>>>> - Redis version (if applicable) >>>>>> - RabbitMQ version (if applicable) >>>>>> >>>>>> - Sanitized airflow configuration >>>>>> - Sanitized broker configuration >>>>>> >>>>>> If possible supply, preferably debug, logs of broker, scheduler and >>>>>> worker. >>>>>> >>>>>> Thanks! >>>>>> Bolke >>>>>> >>>> >>
