I am missing the part of how can DAG Author be aware of the backend order
the cluster admin chooses?
This is a crucial part.

On Thu, Jul 3, 2025 at 12:14 PM Jarek Potiuk <ja...@potiuk.com> wrote:

> Sorry for typos - that was my mobile auto complete... I hope it is
> understandable anyway
>
> czw., 3 lip 2025, 11:13 użytkownik Jarek Potiuk <ja...@potiuk.com>
> napisał:
>
> >
> >
> >
> > czw., 3 lip 2025, 10:14 użytkownik Amogh Desai <amoghdesai....@gmail.com
> >
> > napisał:
> >
> >> Thanks for that angle, Jarek.
> >>
> >> Lets say DB lookup has higher precedence than that of say ENV backend.
> >> Wouldn't this be shooting ourselves in the foot by compromising the
> >> performance here? DB lookup
> >> will be more expensive than DB.
> >>
> >>
> > Oh absolutely. I think if we have this possibility of managing order
> those
> > kind of scenarios alshould be explained in the docs so that users do not
> > shoot themselves in a foot
> >
> > Also following my mail about multi team. I started to think recently -
> > looking at some other OSS software thetwe sometimes take too much
> > responsibility for our users and the snuffer be cause we have to defend
> out
> > opinionated choices when there are use cases that outlet choices do not
> > enable.
> >
> > This is the reason why we have so many 'options' and config values
> because
> > sometimes we do not want to make decisions for our users - but where we
> can
> > make it an option and configuration and clearly explain to o lut users
> (and
> > mostly I am talking about Deployment Manager role from our security
> model).
> > - it's their responsibility to read all the information we provide and
> > follow it when they make decisions on how to configure Airflow - knowing
> > the consequences. And we should be 'harsh' with them - in the sense that
> if
> > they did not read the docs and did not understand it - any time they ask
> > imus about something not working that is explained in the docs - we
> should
> > send them to the doc with 'Read The Friendly Manual' advice - simply
> > because this is the only job they have. And we should not do the job for
> > them.
> >
> > Similarly having operations like that allow our managed service providers
> > to make their opinionated choices and make some configuration options
> > possible, some selected for their users in the context of the service
> > managed. But again - that's their responsibility to manage and understand
> > what are the options and what they mean. Same as individual deployment
> > managers - they can make their own decisions - and if it does not cost
> us a
> > lot we should make it possible for them to make those choices (and take
> > responsibility for their choices)
> >
> > With great powers (of choice) you also have great responsibilities (of
> > consequences of your choices) - and as long we are aware of those
> > consequences and communicate it to deployment managers - it's on their
> > shoulders to make the choices and bear the consequences.
> >
> > J.
> >
> >
> >
> > There could also be a few more side effects that we will have to fully
> >> uncover and come up
> >> with a detailed plan to allow this to be configurable.
> >>
> >> Thanks & Regards,
> >> Amogh Desai
> >>
> >>
> >> On Wed, Jul 2, 2025 at 6:43 PM Jarek Potiuk <ja...@potiuk.com> wrote:
> >>
> >> > I think this is a good idea - but as Ash mentioned, it has to be
> >> executed
> >> > well with a lot of bells and whistles, so that users will not shoot
> >> > themselves in their foot. For example we had recently discussions on
> the
> >> > new UI whether/how to explain the users that their connections in UI
> and
> >> > API **only** show the DB connections (for good reasons) - and it is
> >> already
> >> > difficult to explain to the users, now - this change will also make it
> >> > behave differently (for example - currently when you edit connection
> >> via UI
> >> > it might **not** get into effect if you have same connection defined
> in
> >> the
> >> > secret/env var. But if you make DB first - this changes and there are
> >> few
> >> > edge-cases where it might have some unexpected effect.
> >> >
> >> > But there is one inevitable benefit of this approach that I like - the
> >> > ability of turning airflow DB into an effective "shield" for secret
> >> usage.
> >> > The big drawback of the current "sequence" is that airflow generates a
> >> LOT
> >> > of queries to Secrets' manager, even if your connection is defined in
> >> the
> >> > DB - because it will query secrets first. So currently it is not
> >> possible
> >> > to say "for this, highly frequently used connection I want to keep it
> >> in DB
> >> > to save on the secret's manager queries - both performance and cost
> >> wise -
> >> > because defining connection in the DB does not limit the number of
> >> secret
> >> > manager's queries. So in a number of scenarios, being able to revert
> it
> >> and
> >> > query DB first might be very good for cost and network optimisation.
> >> >
> >> > I think if we describe it (as Ash wrote) well in the docs and explain
> >> those
> >> > scenarios and also clearly communicate it in the UI if Airflow (we
> need
> >> to
> >> > likely have some way of explaining the user what is their currently
> >> > configured sequence and what they should expect to happen if they
> >> > remove/add connection) - then I see it as a really useful feature.
> >> >
> >> > J.
> >> >
> >> > On Wed, Jul 2, 2025 at 2:54 PM Ash Berlin-Taylor <a...@apache.org>
> >> wrote:
> >> >
> >> > > At a high level I’m good with allowing this to be fully
> configurable,
> >> as
> >> > > long as we document the possible warts (“Doctor, it hurts when I do
> >> this”
> >> > > “well don’t do that then!” etc) — though as Amogh mentioned it is
> >> > slightly
> >> > > complicated by the distinction between API Server/Scheduler and the
> >> > > execution time on the worker.
> >> > >
> >> > > (I haven’t looked at the specific implementation yet)
> >> > >
> >> > > -ash
> >> > >
> >> > > > On 2 Jul 2025, at 11:56, Amogh Desai <amoghdesai....@gmail.com>
> >> wrote:
> >> > > >
> >> > > > Hello Anton,
> >> > > >
> >> > > > Thanks for kicking off this discussion. I’d love to understand
> your
> >> > > > motivations a bit more on this front.
> >> > > > From your PR, I am seeing that you are just not allowing addition
> of
> >> > > > multiple custom backends
> >> > > > but also changing the *default_backend* order. I am a bit torn on
> >> that
> >> > > > part.
> >> > > >
> >> > > > The current design intentionally places the metadata DB backend at
> >> the
> >> > > > lowest precedence in the order,
> >> > > > since it’s meant to serve as the ultimate fallback source of
> truth.
> >> Any
> >> > > > additional configured
> >> > > > backends are prioritized higher than it by design.
> >> > > >
> >> > > > With your changes, we now allow configurations like:
> >> > > >
> >> > > >
> >> > > >
> >> > > > *    @conf_vars({("secrets", "backends_order"):
> >> > > > "metastore,environment_variable,unsupported"})    def
> >> > > > test_backends_order_unsupported(self):        with
> >> > > > pytest.raises(AirflowConfigException):
> >> > > ensure_secrets_loaded()*
> >> > > >
> >> > > > I don’t fully understand the motivation behind supporting this
> >> level of
> >> > > > override, especially since it
> >> > > > could allow unsupported or unintended configurations.
> Additionally,
> >> > with
> >> > > > Airflow 3.0+, we already support
> >> > > > a multi layered secret backend resolution capability with the
> >> > > introduction
> >> > > > of secrets backend for workers.
> >> > > > Order goes as:
> >> > > >
> >> > > > *secrets backend on worker directly (optional) > env vars on
> worker
> >> > *
> >> > > > *reach out to api server [secrets backend defined here (optional)
> >
> >> env
> >> > > > vars on api server > metadata DB].*
> >> > > >
> >> > > > You will have to consider this angle too.
> >> > > >
> >> > > > In my opinion, a more practical and realistic use case would be to
> >> have
> >> > > the
> >> > > > ability to define multiple custom backends
> >> > > > both on worker or the API server.
> >> > > >
> >> > > > Looking forward to hearing more from you.
> >> > > >
> >> > > > Thanks & Regards,
> >> > > > Amogh Desai
> >> > > >
> >> > > >
> >> > > > On Wed, Jul 2, 2025 at 3:59 PM Anton Nitochkin <
> >> > ant.nitoch...@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > >> Hello,
> >> > > >>
> >> > > >> I'd like to discuss a new option that can be added via this PR:
> >> > > >> https://github.com/apache/airflow/pull/45931.
> >> > > >>
> >> > > >> Recently, I asked developers in Slack for their thoughts on the
> new
> >> > > >> variable [secrets]backend_order. Long story short: this option
> will
> >> > > >> introduce the ability to configure the backend order and control
> it
> >> > > using
> >> > > >> this variable. The default value will remain the same as in the
> >> > current
> >> > > >> version, so for users who don't need it, things will stay as they
> >> are
> >> > > now.
> >> > > >>
> >> > > >> Jarek Potiuk advised starting a conversation and discussing the
> PR
> >> to
> >> > > reach
> >> > > >> a consensus with the community.
> >> > > >>
> >> > > >> Can you please share your thoughts on the option and its
> >> > implementation?
> >> > > >>
> >> > > >> Anton Nitochkin
> >> > > >>
> >> > >
> >> > >
> >> > >
> ---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> >> > > For additional commands, e-mail: dev-h...@airflow.apache.org
> >> > >
> >> > >
> >> >
> >>
> >
>

Reply via email to