Hi Harvey,

I created https://github.com/apache/incubator-airflow/pull/1948 , this should 
remove the issue for you. I’m not sure if this is the right approach but I 
tested it locally and it does work. Please report back!

Bolke

> Op 18 dec. 2016, om 21:09 heeft Bolke de Bruin <[email protected]> het 
> volgende geschreven:
> 
> (also reported this on the Jira issue)
> 
> Ok I figured out the issue. In short: the scheduler checks the tasks 
> instances without taking into account if the executor already reported back. 
> In this case the executor reports back several iterations later, but the task 
> is queued nevertheless. Due to the fact tasks will not enter the queue when 
> the task is considered running, the task state will be "queued” indefinitely 
> and in limbo between the scheduler and the executor.
> 
> The SequentialExecutor does not have this issue as it will wait for every 
> task to finish before returning. Celery I’m not quite sure yet.
> 
> Fixing this will take a bit more time as I’m unfamiliar with the code in this 
> area (the calling code that is). @max @dan @paul I really could use your help 
> here.
> 
> - Bolke
> 
>> Op 15 dec. 2016, om 22:33 heeft Bolke de Bruin <[email protected]> het 
>> volgende geschreven:
>> 
>> I’m having a look now but didn’t get to the cause yet. The line that reports 
>> the issue is just a facade in the UI and it might not even report the real 
>> cause. Ie the task is being send to the executor but seems already to be 
>> part of queued_tasks and then the executor reports success, without actually 
>> running the task itself.
>> 
>> Paul and Dan were involved with this code and it was heavily changed so I 
>> have to familiarize myself with it. 
>> 
>> - Bolke
>> 
>>> Op 13 dec. 2016, om 19:23 heeft Harvey Xia <[email protected]> 
>>> het volgende geschreven:
>>> 
>>> Hi Bolke,
>>> 
>>> I have tried it on the latest release (1.7.1.3) and can confirm that
>>> retries *do *work. We are forced to use a later commit because we require a
>>> working GCP (Google Cloud Platform) hook, which did not seem to work on the
>>> latest release (upon glancing at the commit history, I think it's due to
>>> the fact taht the latest release does not use the latest version of a
>>> Google client). Another colleague of ours is using a version of Airflow
>>> that works with GCP and also does not suffer from this retry issue, so we
>>> could always use that one. But I wanted to raise this issue and try to
>>> understand why it's occurring. Let me know your thoughts, thanks!
>>> 
>>> 
>>> Harvey Xia | Software Engineer
>>> [email protected]
>>> +1 (339) 225 1875
>>> 
>>> On Tue, Dec 13, 2016 at 1:17 PM, Bolke de Bruin <[email protected]> wrote:
>>> 
>>>> Hey Harvey,
>>>> 
>>>> I don’t have the time to dive in right now, but is this bound to the
>>>> particular commit or did you just grab master at a specific point in time?
>>>> 
>>>> Did you try it on 1.7.1.3? Are you forced to use master?
>>>> 
>>>> - Bolke
>>>> 
>>>>> Op 13 dec. 2016, om 16:43 heeft Harvey Xia <[email protected]>
>>>> het volgende geschreven:
>>>>> 
>>>>> Hello,
>>>>> 
>>>>> I'm an engineer at Spotify, and our team has recently started using
>>>>> Airflow. I have posted the following issue, https://issues.apache.
>>>>> org/jira/browse/AIRFLOW-695, but was hoping to get in contact with
>>>> someone
>>>>> about this question. It is currently blocking us, so any response would
>>>> be
>>>>> greatly appreciated. Thanks so much!
>>>>> 
>>>>> Harvey Xia | Software Engineer
>>>>> [email protected]
>>>>> +1 (339) 225 1875
>>>> 
>>>> 
>> 
> 

Reply via email to