Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Jarek Potiuk Fri, 16 Jan 2026 06:23:37 -0800

> There are a few more reasons why API server will continue to need the
ProvidersManager:


Yeah, I was aware we likely have a few more things I forgot, but this idea
extends to those nicely:

1. Auth Managers -> I consider this as an api-server plugin :), or possibly
separate (apache-airlfow-auth-manager) type of distribution (again this
will work nicely with "shared" library")
2. Secrets Backends -> not sure if that is needed for api-server (maybe
just for configuration retrieval? ) this again can be a plugin - or
separate (apache-airflow-secrets-backend)
3. Providers List Endpoint: maybe we should get rid of this?  > Eventually
this should be part of the same Triggerer DB storage - > triggerer
should store in the DB list of providers installed - already what we
currently have in api-server is kinda wrong - because even now potentially
we can have different providers installed on api-server and different in
workers/triggers - and only those installed in api-server will show up,
swtiching it to reading from DB that will be updated by Triggerrer (also
including team_id as there might be different sets of providers for
different teams) - will make it "correct" (eventually).

But Yeah. We definitely can defer any of that to be done later, if we do
not find it "easier" to do it together - absolutely no pressure there, just
wanted to make sure the "North star" is quite commonly agreed, so that we
know where we are going :). We can definitely proceed with the current POC
"as is"

J.


On Fri, Jan 16, 2026 at 11:11 AM Amogh Desai <[email protected]> wrote:

> Thanks for the suggestion for using jsonschema!
>
> I updated the implementation to use jsonschema instead of the custom
> format. Now the structure looks like this for example:
>
> conn-fields:
>   timeout:
>     label: "Connection Timeout"
>     description: "Timeout in seconds"
>     schema:
>       type: integer
>       minimum: 1
>       maximum: 300
>       default: 30
>
> As for the concerns regarding GCP (14 fields including string, int,
> boolean, and password), I tested it and it
> works well (updated on PR). The code now uses schema object for all
> jsonschema validation properties like min, max, pattern,
> enum, etc while keeping UI metadata like label, description, sensitive or
> not at the top level. This aligns
> better with the react UI which already expects this format.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]>
> wrote:
>
> > Ash -
> >
> > Good catch on the GCP concern. I checked it and this is what it uses:
> >
> >     @classmethod
> >     def get_connection_form_widgets(cls) -> dict[str, Any]:
> >         """Return connection widgets to add to connection form."""
> >         from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget,
> > BS3TextFieldWidget
> >         from flask_babel import lazy_gettext
> >         from wtforms import BooleanField, IntegerField, PasswordField,
> > StringField
> >         from wtforms.validators import NumberRange
> >
> >         return {
> >             "project": StringField(lazy_gettext("Project Id"),
> > widget=BS3TextFieldWidget()),
> >             "key_path": StringField(lazy_gettext("Keyfile Path"),
> > widget=BS3TextFieldWidget()),
> >             "keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"),
> > widget=BS3PasswordFieldWidget()),
> >             "credential_config_file": StringField(
> >                 lazy_gettext("Credential Configuration File"),
> > widget=BS3TextFieldWidget()
> >             ),
> >             "scope": StringField(lazy_gettext("Scopes (comma
> separated)"),
> > widget=BS3TextFieldWidget()),
> >             "key_secret_name": StringField(
> >                 lazy_gettext("Keyfile Secret Name (in GCP Secret
> > Manager)"), widget=BS3TextFieldWidget()
> >             ),
> >             "key_secret_project_id": StringField(
> >                 lazy_gettext("Keyfile Secret Project Id (in GCP Secret
> > Manager)"), widget=BS3TextFieldWidget()
> >             ),
> >             "num_retries": IntegerField(
> >                 lazy_gettext("Number of Retries"),
> >                 validators=[NumberRange(min=0)],
> >                 widget=BS3TextFieldWidget(),
> >                 default=5,
> >             ),
> >             "impersonation_chain": StringField(
> >                 lazy_gettext("Impersonation Chain"),
> > widget=BS3TextFieldWidget()
> >             ),
> >             "idp_issuer_url": StringField(
> >                 lazy_gettext("IdP Token Issue URL (Client Credentials
> > Grant Flow)"),
> >                 widget=BS3TextFieldWidget(),
> >             ),
> >             "client_id": StringField(
> >                 lazy_gettext("Client ID (Client Credentials Grant
> Flow)"),
> > widget=BS3TextFieldWidget()
> >             ),
> >             "client_secret": StringField(
> >                 lazy_gettext("Client Secret (Client Credentials Grant
> > Flow)"),
> >                 widget=BS3PasswordFieldWidget(),
> >             ),
> >             "idp_extra_parameters": StringField(
> >                 lazy_gettext("IdP Extra Request Parameters"),
> > widget=BS3TextFieldWidget()
> >             ),
> >             "is_anonymous": BooleanField(
> >                 lazy_gettext("Anonymous credentials (ignores all other
> > settings)"), default=False
> >             ),
> >         }
> >
> >     @classmethod
> >     def get_ui_field_behaviour(cls) -> dict[str, Any]:
> >         """Return custom field behaviour."""
> >         return {
> >             "hidden_fields": ["host", "schema", "login", "password",
> > "port", "extra"],
> >             "relabeling": {},
> >         }
> >
> > All of these are covered by my schema.
> >
> > Also checked what the react UI supports and:
> >
> > I checked what the react UI supports as of now and this is what I found:
> >
> > string - Text input
> > integer - Number input
> > number - Number input
> > boolean - Checkbox
> > object - JSON object editor
> > array - Array input
> >
> > String Formats:
> > format: "password" - Masked password field
> > format: "multiline" - Textarea
> > format: "date" - Date picker
> > format: "date-time" - DateTime picker
> > format: "time" - Time picker
> >
> > Array Types
> >
> > This all comes from a field selector logic:
> >
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92
> > .
> >
> > Fields are selected based on
> > - `schema.type` (string, integer, boolean, array, object)
> > - `schema.format` (password, multiline, date, date-time, time, email,
> url)
> > - `schema.enum` (if present, dropdown select)
> >
> > So essentially anything with a type, format, and enum defined can be
> > handled by react UI. That said, maybe I should
> > try and adopt using jsonschema format here.
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> >
> > On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]>
> > wrote:
> >
> >> Jarek -
> >>
> >> Re backcompat, yeah, I already have the fallback in place in my POC. The
> >> discovery code
> >> will first try to load the metadata from yaml, and if it fails to do so,
> >> it will use the *python method*
> >> flow to discover the metadata.
> >>
> >> Re the bigger vision about API servers without providers, I love where
> >> you are going with this, but
> >> I think we need to split up the tasks because we aren't there yet. Let
> me
> >> explain -
> >>
> >> Your idea to discover providers via triggerer, store in DB and API
> server
> >> reads from DB might work
> >> for connection forms, but there are a few more reasons why API server
> >> will continue to need the
> >> ProvidersManager:
> >>
> >> 1. Auth Managers
> >> 2. Secrets Backends
> >> 3. Providers List Endpoint: maybe we should get rid of this? IDK who the
> >> consumer of this endpoint is
> >>
> >> So the API server without Providers thing is harder than just connection
> >> forms and we aren't there yet
> >> until we figure out the 3 points from above.
> >>
> >> I suggest we do this instead:
> >>
> >> Phase 1: Connection forms from YAML to establish foundation for the
> future
> >> Phase 2: The DB storage phase - decide if Triggerer / who populates in
> DB
> >> (Maybe not triggerer because we do not want it to have DB access
> >> eventually)
> >>
> >> Does that sound reasonable? What do you think?
> >>
> >>
> >> Thanks & Regards,
> >> Amogh Desai
> >>
> >>
> >> On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:
> >>
> >>> > One main thing was assuming that all providers need to be available
> on
> >>> > Scheduler (? I think that changed?) that there the connection form
> >>> >  definitons are persisted to DB such that the API server directly can
> >>> > read from there - no need to install providers on API Server!
> >>>
> >>> I think Triggerer is better than Scheduler to persist connection
> >>> definition
> >>> to the DB. Essentially Triggerer is the only component that needs DB
> >>> access
> >>> and also needs to have providers installed. Any of the providers might
> >>> implement Triggers and they are very tightly coupled with "Hooks" and
> >>> "Operators".  Scheduler only really needs **scheduler plugins**
> >>> (Timetables
> >>> and such) and **executors** (which we eventually want to split-off from
> >>> current "worker" providers). It does not need "worker providers".
> >>>
> >>> IMHO in many discussions of ours this long term plan / vision is most
> >>> appealing:
> >>>
> >>> * api-server: only needs distributions that are "ui plugins" (no
> >>> providers)
> >>> * scheduler only needs distributions that are "scheduler plugins" (e.g.
> >>> timetables) and "executors"
> >>> * worker only needs "worker/triggerer providers" (i.e. hooks and
> >>> operators
> >>> essentially) and "worker plugins" (e.g. macros)
> >>> * triggerer only needs "worker/triggerer providers" (as in workers) -
> >>> possibly "triggerer plugins" if we ever have a need to have them
> >>>
> >>> Eventually, optionally, each of those should ("api-server",
> "scheduler",
> >>> "worker", "triggerer") should be a separate distribution. Each with its
> >>> own
> >>> dependencies. But this one only makes sense if we find that those
> >>> dependencies could be very different between those - it's likely this
> >>> will
> >>> not happen, because dependency-set for each of those "components" will
> be
> >>> very close. when we finalize the current task-sdk isolation work.
> >>>
> >>> Of course we cannot do it all at once and it will take quite some time
> to
> >>> get there.
> >>>
> >>> But I think we should have it as a "North Star" that we should look at
> >>> when
> >>> we make any "architecture" decisions.  And every decision we make
> should
> >>> bring us closer to this "North Star".
> >>>
> >>> Also - just to note - with the "shared" libraries concept we already
> >>> have,
> >>> and with "uv workspace" in our monorepo - we have ALL the mechanisms
> >>> needed
> >>> to make it happen. And to do it in a very maintainable way with very
> >>> little
> >>> overhead and virtually no change in regular development workflow. For
> >>> example the shared libraries concept might be used to share common code
> >>> for
> >>> both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
> >>> essentially - installable for worker and triggerer) and (future)
> >>> apache-airflow-executors-cncf-kubernetes (executor installable for
> >>> scheduler). Same for amazon worker provider/executor split and edge
> >>> worker
> >>> provider/executor split. All that is doable.
> >>>
> >>> J.
> >>>
> >>>
> >>>
> >>> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
> >>> wrote:
> >>>
> >>> > Also +100 from my side.
> >>> >
> >>> > We discussed exactly this in a Airflow 3 dev call, I was looking for
> >>> the
> >>> > notes... that was when we discussed about the component split in the
> >>> > future. Found a reference in
> >>> >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024
> >>> >
> >>> > ```
> >>> >
> >>> > **Plan for Decoupling Providers's Connections metadata from FAB (Jens
> >>> > Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**
> >>> >
> >>> >   * Jens created this draft PR
> >>> >     <https://github.com/apache/airflow/pull/41656> with the POC for
> it
> >>> >     and presented it on the call.
> >>> >   * Jarek <https://cwiki.apache.org/confluence/display/~potiuk>
> >>> proposed
> >>> >     the idea of dumping the JSON/YAML with connection fields in the
> >>> >     Database or loading it via package metadata so we don't load all
> >>> the
> >>> >     dependencies on the webserver.
> >>> >   * We will need some plan for external providers on how they can
> >>> define
> >>> >     connections or register them.
> >>> >   * The POC successfully proved that we can separate the connection
> >>> >     metadata from FAB
> >>> >   * /*Action Item*/: Jens
> >>> >     <https://cwiki.apache.org/confluence/display/~jscheffl> to
> create
> >>> a
> >>> >     GitHub issue for decoupling the Connection metadata from FAB
> >>> >
> >>> > ```
> >>> >
> >>> > Also on Sep 19th 2024 we had an overview which pieces of the
> providers
> >>> > are needed where:
> >>> >
> >>> >
> >>> >
> >>>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024
> >>> >
> >>> > Follow-up was notes in Github ticket:
> >>> > https://github.com/apache/airflow/issues/42016
> >>> >
> >>> >
> >>> > One main thing was assuming that all providers need to be available
> on
> >>> > Scheduler (? I think that changed?) that there the connection form
> >>> > definitons are persisted to DB such that the API server directly can
> >>> > read from there - no need to install providers on API Server!
> >>> >
> >>> > Looking forward for the contribution... I assume no VOTE needed :-D
> >>> >
> >>> > Jens
> >>> >
> >>> > On 1/15/26 15:52, Ash Berlin-Taylor wrote:
> >>> > > As an idea/structure I think its certainly the right way to go —
> not
> >>> > needing the code, not the instantiated widget classes, to (I suspect)
> >>> throw
> >>> > them away in the new React UI certainly seems like a silly idea now.
> >>> > >
> >>> > > In your POC I don’t think you have got the ability to have the
> extra
> >>> > fields that, for instance, Google Cloud connection has yet though.
> >>> > >
> >>> > > As for the schema we need to express: I’d say we should look at
> what
> >>> the
> >>> > react UI currently supports?
> >>> > >
> >>> > > -ash
> >>> > >
> >>> > >> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]>
> wrote:
> >>> > >>
> >>> > >> Hi All,
> >>> > >>
> >>> > >> I wanted to get feedback on something I have been twiddling with.
> >>> For
> >>> > >> context, the API server has to import
> >>> > >> every single hook class from all providers just to render
> connection
> >>> > forms
> >>> > >> in the UI. This is because the UI
> >>> > >> metadata (what fields to show, labels, validators, etc.) are
> living
> >>> in
> >>> > >> python functions like `get_connection_form_widgets()`
> >>> > >> and `get_ui_field_behaviour()` which are defined on the hook
> >>> classes.
> >>> > >>
> >>> > >> This means:
> >>> > >> - API server startup imports 100+ hook classes it might not
> actually
> >>> > need
> >>> > >> - Slower startup due to heavier memory footprint
> >>> > >> - Poor client-server separation (why does the API server need to
> >>> know
> >>> > about
> >>> > >> pyodbc just to show a UI form?)
> >>> > >>
> >>> > >> My proposal
> >>> > >>
> >>> > >> Moving the UI metadata from python code to something static /
> >>> > declarative
> >>> > >> like yaml. I want to add this information
> >>> > >> in the provider.yaml file that every provider already has. For
> >>> example -
> >>> > >>
> >>> > >> class PostgresHook(BaseHook):
> >>> > >>     @classmethod
> >>> > >>     def get_ui_field_behaviour(cls) -> dict[str, Any]:
> >>> > >>         return {
> >>> > >>             "hidden_fields": [],
> >>> > >>             "relabeling": {
> >>> > >>                 "schema": "Database",
> >>> > >>             },
> >>> > >>         }
> >>> > >>
> >>> > >> Will become:
> >>> > >>
> >>> > >> connection-types:
> >>> > >>   - connection-type: postgres
> >>> > >>     hook-class-name:
> >>> > airflow.providers.postgres.hooks.postgres.PostgresHook
> >>> > >>
> >>> > >>     ui-field-behaviour:
> >>> > >>       hidden-fields: []
> >>> > >>       relabeling:
> >>> > >>         schema: "Database"
> >>> > >>
> >>> > >>     conn-fields:
> >>> > >>       sslmode:
> >>> > >>         type: string
> >>> > >>         label: SSL Mode
> >>> > >>         enum: ["disable", "prefer", "require"]
> >>> > >>         default: "prefer"
> >>> > >>
> >>> > >>       timeout:
> >>> > >>         type: integer
> >>> > >>         label: Timeout
> >>> > >>         range: [1, 300]
> >>> > >>         default: 30
> >>> > >>
> >>> > >> The schema will now consist of two new sections:
> >>> > >>
> >>> > >> 1. ui-field-behaviour
> >>> > >> - Used to customize the standard connection fields (host, port,
> >>> login,
> >>> > etc.)
> >>> > >> - hidden-fields: Hide some fields
> >>> > >> - relabeling: Change labels for some fields (like schema ->
> Database
> >>> > above)
> >>> > >> - placeholders: Show hints in the form (port 5432 for example)
> >>> > >>
> >>> > >> 2. conn-fields
> >>> > >> - Can be used to define custom fields stored in Connection.extra
> >>> > >> - You can define inline validators like enum, range, pattern,
> >>> > min-length,
> >>> > >> max-length
> >>> > >> - Will support the standard wtforms string, integer, boolean,
> number
> >>> > types
> >>> > >>
> >>> > >> As for why this schema was chosen, check the comparison with
> >>> > alternative in
> >>> > >> the PR
> >>> > >> desc:https://github.com/apache/airflow/pull/60410
> >>> > >>
> >>> > >>
> >>> > >> Current Status
> >>> > >>
> >>> > >> I have a POC in:https://github.com/apache/airflow/pull/60410
> where
> >>> I
> >>> > chose
> >>> > >> two pilot providers of
> >>> > >> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla
> >>> form but
> >>> > >> SMTP has some hidden fields).
> >>> > >>
> >>> > >>
> >>> > >> Benefits this will offer
> >>> > >>
> >>> > >> - Once complete, the API server won't import any hook classes for
> UI
> >>> > >> rendering leading to faster startup
> >>> > >> - Provider dependencies don't affect API server
> >>> > >> - YAML is easier to read/write than python functions for form
> >>> metadata
> >>> > >>
> >>> > >> Would love feedback on:
> >>> > >> 1. Schema design - does it cover your use cases?
> >>> > >> 2. Any missing field types or validators?
> >>> > >>
> >>> > >> The goal is to get the pilot providers in so we can start
> migrating
> >>> > >> providers incrementally. Old way still
> >>> > >> works, so no rush for everyone to migrate at once.
> >>> > >>
> >>> > >> Thoughts?
> >>> > >>
> >>> > >> Thanks & Regards,
> >>> > >> Amogh Desai
> >>> > >
> >>> > >
> ---------------------------------------------------------------------
> >>> > > To unsubscribe, e-mail:[email protected]
> >>> > > For additional commands, e-mail:[email protected]
> >>> > >
> >>>
> >>
>

Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Reply via email to