This is weird and hope not bad utc conversion tricking me....
So splunk logs for worker shows the process logs were created at 9am
("Logging into: ...."), the first entry of the log was at 14:00 ("Filling
up the DagBag"). If I go to the DB and calculate queue time this specific
dag was delayed 5 hours which matches the logs...
On Mon, Mar 19, 2018, 9:10 AM David Capwell <[email protected]> wrote:
> The major reason we have been waiting was mostly because 1.8.2 and 1.9 are
> backwards incompatible (don't remember off the top of my head but one
> operator broke important so everything failed for us), so neglected doing
> the work to support both versions (need to support both since different
> teams move at different rates).
>
> We need to do this anyways (frozen in time is very bad).
>
> On Mon, Mar 19, 2018, 1:47 AM Driesprong, Fokko <[email protected]>
> wrote:
>
>> Hi David,
>>
>> First I would update to Apache Airflow 1.9.0, there have been a lot of
>> fixes between 1.8.2 and 1.9.0. Just to see if the bug is still in there.
>>
>> Cheers, Fokko
>>
>> 2018-03-18 19:41 GMT+01:00 David Capwell <[email protected]>:
>>
>> > Thanks for the reply
>> >
>> > Our script doesn't set it so should be off; the process does not
>> normally
>> > restart (monitoring has a counter for number of restarts since deploy,
>> > currently as 0)
>> >
>> > At the point in time the UI showed the upstream tasks as green
>> (success);
>> > we manually ran tasks so no longer in the same state, so can't check UI
>> > right now
>> >
>> > On Sun, Mar 18, 2018, 11:34 AM Bolke de Bruin <[email protected]>
>> wrote:
>> >
>> > > Are you running with num_runs? If so disable it. We have seen this
>> > > behavior with num_runs. Also you can find out by clicking on the task
>> if
>> > > there is a dependency issue.
>> > >
>> > > B.
>> > >
>> > > Verstuurd vanaf mijn iPad
>> > >
>> > > > Op 18 mrt. 2018 om 19:08 heeft David Capwell <[email protected]>
>> het
>> > > volgende geschreven:
>> > > >
>> > > > We just started seeing this a few days ago after turning on SLA for
>> our
>> > > > tasks (not saying SLA did this, may have been happening before and
>> not
>> > > > noticing), but we have a dag that runs once a hour and we see that
>> 4-5
>> > > dag
>> > > > runs are marked running but tasks are not getting scheduled. When
>> we
>> > get
>> > > > the SLA alert the action we are doing right now is going to the UI
>> and
>> > > > clicking run on tasks manually; this is only needed for the oldest
>> dag
>> > > run
>> > > > and the rest recover after that. In the past 3 days this has
>> happened
>> > > twice
>> > > > to us.
>> > > >
>> > > > We are running 1.8.2, are there any known jira about this? Don't
>> know
>> > > > scheduler well, what could I do to see why these tasks are getting
>> > > skipped
>> > > > without manual intervention?
>> > > >
>> > > > Thanks for your time.
>> > >
>> >
>>
>