On Wed, Sep 7, 2016 at 12:17 PM, Bolke de Bruin <[email protected]> wrote:
> Ah this is the more interesting case. Are you getting tasks into SCHEDULED 
> and then the scheduler itself gets stuck? Or do the workers not execute 
> anything anymore?

The tasks are put into the SCHEDULED state but they don't make it to a
worker. This isn't deterministic. With our patch to clean up orphans,
a task may flap in SCHEDULED a few times but eventually it makes it to
a worker.

The scheduler and workers are otherwise running fine. We've been
running with the same celery/redis setup for a year.

> How do you run your scheduler? With num_runs?

We don't use num_runs. We restart the scheduler when we deploy new code.

> A later patch checks for these “orphaned_tasks” at scheduler start up.

We check for the orphans at the top of the scheduler loop, so on every run.

> In other words can you provide some more information :-).
>
> Bolke
>
>> Op 7 sep. 2016, om 20:08 heeft Jeff Balogh <[email protected]> het 
>> volgende geschreven:
>>
>> Ah yep, we're on 
>> https://github.com/apache/incubator-airflow/commits/54b361d2a.
>>
>> On Wed, Sep 7, 2016 at 10:13 AM, Bolke de Bruin <[email protected]> wrote:
>>> Hi Jeff,
>>>
>>> That is kind of impossible for 1.7.1.3 as the SCHEDULED state was 
>>> introduced after release. Are you sure you are on 1.7.1.3 and not on master?
>>>
>>> Bolke
>>>
>>>> Op 7 sep. 2016, om 18:37 heeft Jeff Balogh <[email protected]> 
>>>> het volgende geschreven:
>>>>
>>>> When we bumped to 1.7.1.3 we found that tasks would go into the new
>>>> SCHEDULED state and get stuck there. We haven't determined why this
>>>> happens.
>>>>
>>>> We put a hacky patch into our scheduler that sets state to None for
>>>> any tasks that are SCHEDULED at the beginning of the schedule loop.
>>>>
>>>> Name: airflow
>>>> Version: 1.7.1.3
>>>> Name: celery
>>>> Version: 3.1.23
>>>> Name: kombu
>>>> Version: 3.0.35
>>>>
>>>> redis_version:2.6.13
>>>>
>>>> On Sun, Sep 4, 2016 at 6:34 AM, Bolke de Bruin <[email protected]> wrote:
>>>>> Hi All,
>>>>>
>>>>> We have had some reports on this list and sometimes on Jira that the 
>>>>> scheduler sometimes seems to get stuck. I would like to track down this 
>>>>> issue, but until now much of the reporting has been a bit light on the 
>>>>> details.
>>>>>
>>>>> First and foremost I am assuming that getting “stuck” is only happening 
>>>>> when using a CeleryExecutor. To further track down the issue I would like 
>>>>> to know the following
>>>>>
>>>>> - Airflow version (pip show airflow)
>>>>> - Celery version (pip show celery)
>>>>> - Kombu version (pip show kombu)
>>>>>
>>>>> - Redis version (if applicable)
>>>>> - RabbitMQ version (if applicable)
>>>>>
>>>>> - Sanitized airflow configuration
>>>>> - Sanitized broker configuration
>>>>>
>>>>> If possible supply, preferably debug, logs of broker, scheduler and 
>>>>> worker.
>>>>>
>>>>> Thanks!
>>>>> Bolke
>>>>>
>>>
>

Reply via email to