Yeah, valuable to show it in logs. For showing it in a web server or
storing it in DB, the cardinality of those logs is too high.

On Fri, 14 Jun 2024 at 11:09, Eugen Kosteev <[email protected]> wrote:

> Yeah, I also think it is a good idea to expose it in the Airflow UI.
>
> Although, atm we do not have an entity such as DAG file (and this metric is
> per DAG file) in Airflow database, so we would need to design it a little
> bit.
> And attaching to the DAG model is not correct.
>
> But I totally agree, it would be good to have it in Airflow UI as well for
> "operation users" to have access to this information.
>
> On Fri, Jun 14, 2024 at 11:22 AM Jarek Potiuk <[email protected]> wrote:
>
> > Good idea, it would also be good if we could have access to the
> information
> > exposed in the UI - so that "operations users" can see it and maybe even
> > act on it + API/ CLI to check it. I think in the future of Airflow 3
> where
> > we will have task isolation, having `0` for all the DAGs will be a
> > prerequisite for switching to "task isolation" mode and this could be
> > actually verified in a migration tool.
> >
> > On Fri, Jun 14, 2024 at 10:59 AM Eugen Kosteev <[email protected]>
> wrote:
> >
> > > Hi.
> > >
> > > I would like to discuss the proposal of adding a new column to the "DAG
> > > File Processing Stats" of DAG processor logs.
> > >
> > > Currently in the logs of DAG processor, there is following data
> > > (screenshot below) that includes # of DAGs, runtime, etc. per DAG file.
> > > [image: image.png]
> > >
> > > It seems that it would be beneficial to have also there data about the
> > > number of queries performed to the Airflow database during parsing of
> > each
> > > file.
> > > It maybe convenient to have it in case of debugging issues related to
> > high
> > > load on Airflow database, e.g. typical scenario is when DAG file(s)
> have
> > > a lot of queries to database done on the top level of code and those
> are
> > > executed each time during parsing of these DAG files.
> > > One common example is excessive usage of "Variables.get" as top-level
> > > statements in DAG files.
> > >
> > > Having information about "number of queries to Airflow database" per
> DAG
> > > file may help a lot during debugging issues related to high load on
> > > database or issues related to long parsing of the DAG files.
> > >
> > > One caveat is that due to e.g. caching enabled for Variables or because
> > of
> > > other reasons (dynamic DAGs), number of queries may be very different
> for
> > > each parsing of the DAG file,
> > > but at least we can have it as "Last Run Number of Queries" - that
> would
> > > already give some idea and engineer can also review logs historically
> to
> > > see its data in the past.
> > >
> > > What are your thoughts?
> > >
> > > --
> > > Eugene
> > >
> >
>
>
> --
> Eugene
>

Reply via email to