On 2018/08/09 06:27:30, Bolke de Bruin <bdbr...@gmail.com> wrote:
> Hi vardang,
>
> What do you intent to gain from this metric? There are many influences that
> influence a difference between execution date and start date. You named one
> of them, but there are also functional ones (limits reached etc). We are not
> a real time system so we never really purposefully aimed for lowering a
> difference because.
>
> B.
>
> Verstuurd vanaf mijn iPad
>
> > Op 9 aug. 2018 om 08:04 heeft vardangupta...@gmail.com
> > <vardangupta...@gmail.com> het volgende geschreven:
> >
> >
> >
> >> On 2018/08/06 07:07:05, vardangupta...@gmail.com
> >> <vardangupta...@gmail.com> wrote:
> >> Hi Everyone,
> >>
> >> We just wanted to calculate a metric which can talk about what's the
> >> delay(if any) between DAG getting active in scheduler & server and then
> >> tasks of DAG actually getting kicked off (let's suppose start_date was of
> >> 1 hour earlier and schedule was every 10 minutes).
> >>
> >> Currently task_instance table has execution_date, start_date, end_date &
> >> queued_dttm, we can easily get this metric from the difference of
> >> start_date & execution_date but in case of back fill, execution_date will
> >> be of previous schedule occurrence and difference of start_date &
> >> execution_date will be skewed, though it will be okay for any future runs
> >> to get the delay in scheduling but for back fills, this number won't be
> >> trustworthy, any suggestions how to smartly identify this metric, may be
> >> by knowing somehow back fill details? Even in DAG table, there is no
> >> create_date & update_date notion which can tell me when this DAG was
> >> originally brought to existence?
> >>
> >>
> >> Regards,
> >> Vardan Gupta
> >>
> > Can someone look at the issue?
>
Yes, you're right. Nature of Airflow is not to schedule real time scenarios,
but as a service provider in our organization, we wanted to reach a number
before talking to our internal teams, so that we could possibly convey a
number, let's say in 95 percentile scheduling, there will be no more delay of x
minutes.