Current theory is priority, this dag is high fanout where as other DAGs are much deeper. Looking at the scheduler code and database this looks hard to prove since I only see logs for success and DB doesn't disguise between runnable and not scheduled; is there a good way to check schedule delay?
On Mon, Mar 19, 2018, 6:15 PM David Capwell <[email protected]> wrote: > Ignore that, must be something with splunk since stdiut doesn't have a > date field; the same process writing to a file is printing that out and > Filling is before that line... > > On Mon, Mar 19, 2018, 5:35 PM David Capwell <[email protected]> wrote: > >> This is weird and hope not bad utc conversion tricking me.... >> >> >> So splunk logs for worker shows the process logs were created at 9am >> ("Logging into: ...."), the first entry of the log was at 14:00 ("Filling >> up the DagBag"). If I go to the DB and calculate queue time this specific >> dag was delayed 5 hours which matches the logs... >> >> >> >> On Mon, Mar 19, 2018, 9:10 AM David Capwell <[email protected]> wrote: >> >>> The major reason we have been waiting was mostly because 1.8.2 and 1.9 >>> are backwards incompatible (don't remember off the top of my head but one >>> operator broke important so everything failed for us), so neglected doing >>> the work to support both versions (need to support both since different >>> teams move at different rates). >>> >>> We need to do this anyways (frozen in time is very bad). >>> >>> On Mon, Mar 19, 2018, 1:47 AM Driesprong, Fokko <[email protected]> >>> wrote: >>> >>>> Hi David, >>>> >>>> First I would update to Apache Airflow 1.9.0, there have been a lot of >>>> fixes between 1.8.2 and 1.9.0. Just to see if the bug is still in there. >>>> >>>> Cheers, Fokko >>>> >>>> 2018-03-18 19:41 GMT+01:00 David Capwell <[email protected]>: >>>> >>>> > Thanks for the reply >>>> > >>>> > Our script doesn't set it so should be off; the process does not >>>> normally >>>> > restart (monitoring has a counter for number of restarts since deploy, >>>> > currently as 0) >>>> > >>>> > At the point in time the UI showed the upstream tasks as green >>>> (success); >>>> > we manually ran tasks so no longer in the same state, so can't check >>>> UI >>>> > right now >>>> > >>>> > On Sun, Mar 18, 2018, 11:34 AM Bolke de Bruin <[email protected]> >>>> wrote: >>>> > >>>> > > Are you running with num_runs? If so disable it. We have seen this >>>> > > behavior with num_runs. Also you can find out by clicking on the >>>> task if >>>> > > there is a dependency issue. >>>> > > >>>> > > B. >>>> > > >>>> > > Verstuurd vanaf mijn iPad >>>> > > >>>> > > > Op 18 mrt. 2018 om 19:08 heeft David Capwell <[email protected]> >>>> het >>>> > > volgende geschreven: >>>> > > > >>>> > > > We just started seeing this a few days ago after turning on SLA >>>> for our >>>> > > > tasks (not saying SLA did this, may have been happening before >>>> and not >>>> > > > noticing), but we have a dag that runs once a hour and we see >>>> that 4-5 >>>> > > dag >>>> > > > runs are marked running but tasks are not getting scheduled. >>>> When we >>>> > get >>>> > > > the SLA alert the action we are doing right now is going to the >>>> UI and >>>> > > > clicking run on tasks manually; this is only needed for the >>>> oldest dag >>>> > > run >>>> > > > and the rest recover after that. In the past 3 days this has >>>> happened >>>> > > twice >>>> > > > to us. >>>> > > > >>>> > > > We are running 1.8.2, are there any known jira about this? Don't >>>> know >>>> > > > scheduler well, what could I do to see why these tasks are getting >>>> > > skipped >>>> > > > without manual intervention? >>>> > > > >>>> > > > Thanks for your time. >>>> > > >>>> > >>>> >>>
