Hi Harvey, I created https://github.com/apache/incubator-airflow/pull/1948 , this should remove the issue for you. I’m not sure if this is the right approach but I tested it locally and it does work. Please report back!
Bolke > Op 18 dec. 2016, om 21:09 heeft Bolke de Bruin <[email protected]> het > volgende geschreven: > > (also reported this on the Jira issue) > > Ok I figured out the issue. In short: the scheduler checks the tasks > instances without taking into account if the executor already reported back. > In this case the executor reports back several iterations later, but the task > is queued nevertheless. Due to the fact tasks will not enter the queue when > the task is considered running, the task state will be "queued” indefinitely > and in limbo between the scheduler and the executor. > > The SequentialExecutor does not have this issue as it will wait for every > task to finish before returning. Celery I’m not quite sure yet. > > Fixing this will take a bit more time as I’m unfamiliar with the code in this > area (the calling code that is). @max @dan @paul I really could use your help > here. > > - Bolke > >> Op 15 dec. 2016, om 22:33 heeft Bolke de Bruin <[email protected]> het >> volgende geschreven: >> >> I’m having a look now but didn’t get to the cause yet. The line that reports >> the issue is just a facade in the UI and it might not even report the real >> cause. Ie the task is being send to the executor but seems already to be >> part of queued_tasks and then the executor reports success, without actually >> running the task itself. >> >> Paul and Dan were involved with this code and it was heavily changed so I >> have to familiarize myself with it. >> >> - Bolke >> >>> Op 13 dec. 2016, om 19:23 heeft Harvey Xia <[email protected]> >>> het volgende geschreven: >>> >>> Hi Bolke, >>> >>> I have tried it on the latest release (1.7.1.3) and can confirm that >>> retries *do *work. We are forced to use a later commit because we require a >>> working GCP (Google Cloud Platform) hook, which did not seem to work on the >>> latest release (upon glancing at the commit history, I think it's due to >>> the fact taht the latest release does not use the latest version of a >>> Google client). Another colleague of ours is using a version of Airflow >>> that works with GCP and also does not suffer from this retry issue, so we >>> could always use that one. But I wanted to raise this issue and try to >>> understand why it's occurring. Let me know your thoughts, thanks! >>> >>> >>> Harvey Xia | Software Engineer >>> [email protected] >>> +1 (339) 225 1875 >>> >>> On Tue, Dec 13, 2016 at 1:17 PM, Bolke de Bruin <[email protected]> wrote: >>> >>>> Hey Harvey, >>>> >>>> I don’t have the time to dive in right now, but is this bound to the >>>> particular commit or did you just grab master at a specific point in time? >>>> >>>> Did you try it on 1.7.1.3? Are you forced to use master? >>>> >>>> - Bolke >>>> >>>>> Op 13 dec. 2016, om 16:43 heeft Harvey Xia <[email protected]> >>>> het volgende geschreven: >>>>> >>>>> Hello, >>>>> >>>>> I'm an engineer at Spotify, and our team has recently started using >>>>> Airflow. I have posted the following issue, https://issues.apache. >>>>> org/jira/browse/AIRFLOW-695, but was hoping to get in contact with >>>> someone >>>>> about this question. It is currently blocking us, so any response would >>>> be >>>>> greatly appreciated. Thanks so much! >>>>> >>>>> Harvey Xia | Software Engineer >>>>> [email protected] >>>>> +1 (339) 225 1875 >>>> >>>> >> >
