Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Amogh Desai Thu, 15 Jan 2026 23:21:18 -0800

Ash -

Good catch on the GCP concern. I checked it and this is what it uses:


    @classmethod
    def get_connection_form_widgets(cls) -> dict[str, Any]:
        """Return connection widgets to add to connection form."""
        from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget,
BS3TextFieldWidget
        from flask_babel import lazy_gettext
        from wtforms import BooleanField, IntegerField, PasswordField,
StringField
        from wtforms.validators import NumberRange

        return {
            "project": StringField(lazy_gettext("Project Id"),
widget=BS3TextFieldWidget()),
            "key_path": StringField(lazy_gettext("Keyfile Path"),
widget=BS3TextFieldWidget()),
            "keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"),
widget=BS3PasswordFieldWidget()),
            "credential_config_file": StringField(
                lazy_gettext("Credential Configuration File"),
widget=BS3TextFieldWidget()
            ),
            "scope": StringField(lazy_gettext("Scopes (comma separated)"),
widget=BS3TextFieldWidget()),
            "key_secret_name": StringField(
                lazy_gettext("Keyfile Secret Name (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
            ),
            "key_secret_project_id": StringField(
                lazy_gettext("Keyfile Secret Project Id (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
            ),
            "num_retries": IntegerField(
                lazy_gettext("Number of Retries"),
                validators=[NumberRange(min=0)],
                widget=BS3TextFieldWidget(),
                default=5,
            ),
            "impersonation_chain": StringField(
                lazy_gettext("Impersonation Chain"),
widget=BS3TextFieldWidget()
            ),
            "idp_issuer_url": StringField(
                lazy_gettext("IdP Token Issue URL (Client Credentials Grant
Flow)"),
                widget=BS3TextFieldWidget(),
            ),
            "client_id": StringField(
                lazy_gettext("Client ID (Client Credentials Grant Flow)"),
widget=BS3TextFieldWidget()
            ),
            "client_secret": StringField(
                lazy_gettext("Client Secret (Client Credentials Grant
Flow)"),
                widget=BS3PasswordFieldWidget(),
            ),
            "idp_extra_parameters": StringField(
                lazy_gettext("IdP Extra Request Parameters"),
widget=BS3TextFieldWidget()
            ),
            "is_anonymous": BooleanField(
                lazy_gettext("Anonymous credentials (ignores all other
settings)"), default=False
            ),
        }

    @classmethod
    def get_ui_field_behaviour(cls) -> dict[str, Any]:
        """Return custom field behaviour."""
        return {
            "hidden_fields": ["host", "schema", "login", "password",
"port", "extra"],
            "relabeling": {},
        }

All of these are covered by my schema.

Also checked what the react UI supports and:

I checked what the react UI supports as of now and this is what I found:

string - Text input
integer - Number input
number - Number input
boolean - Checkbox
object - JSON object editor
array - Array input

String Formats:
format: "password" - Masked password field
format: "multiline" - Textarea
format: "date" - Date picker
format: "date-time" - DateTime picker
format: "time" - Time picker

Array Types

This all comes from a field selector logic:
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92
.

Fields are selected based on
- `schema.type` (string, integer, boolean, array, object)
- `schema.format` (password, multiline, date, date-time, time, email, url)
- `schema.enum` (if present, dropdown select)

So essentially anything with a type, format, and enum defined can be
handled by react UI. That said, maybe I should
try and adopt using jsonschema format here.

Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]> wrote:

> Jarek -
>
> Re backcompat, yeah, I already have the fallback in place in my POC. The
> discovery code
> will first try to load the metadata from yaml, and if it fails to do so,
> it will use the *python method*
> flow to discover the metadata.
>
> Re the bigger vision about API servers without providers, I love where you
> are going with this, but
> I think we need to split up the tasks because we aren't there yet. Let me
> explain -
>
> Your idea to discover providers via triggerer, store in DB and API server
> reads from DB might work
> for connection forms, but there are a few more reasons why API server will
> continue to need the
> ProvidersManager:
>
> 1. Auth Managers
> 2. Secrets Backends
> 3. Providers List Endpoint: maybe we should get rid of this? IDK who the
> consumer of this endpoint is
>
> So the API server without Providers thing is harder than just connection
> forms and we aren't there yet
> until we figure out the 3 points from above.
>
> I suggest we do this instead:
>
> Phase 1: Connection forms from YAML to establish foundation for the future
> Phase 2: The DB storage phase - decide if Triggerer / who populates in DB
> (Maybe not triggerer because we do not want it to have DB access
> eventually)
>
> Does that sound reasonable? What do you think?
>
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:
>
>> > One main thing was assuming that all providers need to be available on
>> > Scheduler (? I think that changed?) that there the connection form
>> >  definitons are persisted to DB such that the API server directly can
>> > read from there - no need to install providers on API Server!
>>
>> I think Triggerer is better than Scheduler to persist connection
>> definition
>> to the DB. Essentially Triggerer is the only component that needs DB
>> access
>> and also needs to have providers installed. Any of the providers might
>> implement Triggers and they are very tightly coupled with "Hooks" and
>> "Operators".  Scheduler only really needs **scheduler plugins**
>> (Timetables
>> and such) and **executors** (which we eventually want to split-off from
>> current "worker" providers). It does not need "worker providers".
>>
>> IMHO in many discussions of ours this long term plan / vision is most
>> appealing:
>>
>> * api-server: only needs distributions that are "ui plugins" (no
>> providers)
>> * scheduler only needs distributions that are "scheduler plugins" (e.g.
>> timetables) and "executors"
>> * worker only needs "worker/triggerer providers" (i.e. hooks and operators
>> essentially) and "worker plugins" (e.g. macros)
>> * triggerer only needs "worker/triggerer providers" (as in workers) -
>> possibly "triggerer plugins" if we ever have a need to have them
>>
>> Eventually, optionally, each of those should ("api-server", "scheduler",
>> "worker", "triggerer") should be a separate distribution. Each with its
>> own
>> dependencies. But this one only makes sense if we find that those
>> dependencies could be very different between those - it's likely this will
>> not happen, because dependency-set for each of those "components" will be
>> very close. when we finalize the current task-sdk isolation work.
>>
>> Of course we cannot do it all at once and it will take quite some time to
>> get there.
>>
>> But I think we should have it as a "North Star" that we should look at
>> when
>> we make any "architecture" decisions.  And every decision we make should
>> bring us closer to this "North Star".
>>
>> Also - just to note - with the "shared" libraries concept we already have,
>> and with "uv workspace" in our monorepo - we have ALL the mechanisms
>> needed
>> to make it happen. And to do it in a very maintainable way with very
>> little
>> overhead and virtually no change in regular development workflow. For
>> example the shared libraries concept might be used to share common code
>> for
>> both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
>> essentially - installable for worker and triggerer) and (future)
>> apache-airflow-executors-cncf-kubernetes (executor installable for
>> scheduler). Same for amazon worker provider/executor split and edge worker
>> provider/executor split. All that is doable.
>>
>> J.
>>
>>
>>
>> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
>> wrote:
>>
>> > Also +100 from my side.
>> >
>> > We discussed exactly this in a Airflow 3 dev call, I was looking for the
>> > notes... that was when we discussed about the component split in the
>> > future. Found a reference in
>> >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024
>> >
>> > ```
>> >
>> > **Plan for Decoupling Providers's Connections metadata from FAB (Jens
>> > Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**
>> >
>> >   * Jens created this draft PR
>> >     <https://github.com/apache/airflow/pull/41656> with the POC for it
>> >     and presented it on the call.
>> >   * Jarek <https://cwiki.apache.org/confluence/display/~potiuk>
>> proposed
>> >     the idea of dumping the JSON/YAML with connection fields in the
>> >     Database or loading it via package metadata so we don't load all the
>> >     dependencies on the webserver.
>> >   * We will need some plan for external providers on how they can define
>> >     connections or register them.
>> >   * The POC successfully proved that we can separate the connection
>> >     metadata from FAB
>> >   * /*Action Item*/: Jens
>> >     <https://cwiki.apache.org/confluence/display/~jscheffl> to create a
>> >     GitHub issue for decoupling the Connection metadata from FAB
>> >
>> > ```
>> >
>> > Also on Sep 19th 2024 we had an overview which pieces of the providers
>> > are needed where:
>> >
>> >
>> >
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024
>> >
>> > Follow-up was notes in Github ticket:
>> > https://github.com/apache/airflow/issues/42016
>> >
>> >
>> > One main thing was assuming that all providers need to be available on
>> > Scheduler (? I think that changed?) that there the connection form
>> > definitons are persisted to DB such that the API server directly can
>> > read from there - no need to install providers on API Server!
>> >
>> > Looking forward for the contribution... I assume no VOTE needed :-D
>> >
>> > Jens
>> >
>> > On 1/15/26 15:52, Ash Berlin-Taylor wrote:
>> > > As an idea/structure I think its certainly the right way to go — not
>> > needing the code, not the instantiated widget classes, to (I suspect)
>> throw
>> > them away in the new React UI certainly seems like a silly idea now.
>> > >
>> > > In your POC I don’t think you have got the ability to have the extra
>> > fields that, for instance, Google Cloud connection has yet though.
>> > >
>> > > As for the schema we need to express: I’d say we should look at what
>> the
>> > react UI currently supports?
>> > >
>> > > -ash
>> > >
>> > >> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]> wrote:
>> > >>
>> > >> Hi All,
>> > >>
>> > >> I wanted to get feedback on something I have been twiddling with. For
>> > >> context, the API server has to import
>> > >> every single hook class from all providers just to render connection
>> > forms
>> > >> in the UI. This is because the UI
>> > >> metadata (what fields to show, labels, validators, etc.) are living
>> in
>> > >> python functions like `get_connection_form_widgets()`
>> > >> and `get_ui_field_behaviour()` which are defined on the hook classes.
>> > >>
>> > >> This means:
>> > >> - API server startup imports 100+ hook classes it might not actually
>> > need
>> > >> - Slower startup due to heavier memory footprint
>> > >> - Poor client-server separation (why does the API server need to know
>> > about
>> > >> pyodbc just to show a UI form?)
>> > >>
>> > >> My proposal
>> > >>
>> > >> Moving the UI metadata from python code to something static /
>> > declarative
>> > >> like yaml. I want to add this information
>> > >> in the provider.yaml file that every provider already has. For
>> example -
>> > >>
>> > >> class PostgresHook(BaseHook):
>> > >>     @classmethod
>> > >>     def get_ui_field_behaviour(cls) -> dict[str, Any]:
>> > >>         return {
>> > >>             "hidden_fields": [],
>> > >>             "relabeling": {
>> > >>                 "schema": "Database",
>> > >>             },
>> > >>         }
>> > >>
>> > >> Will become:
>> > >>
>> > >> connection-types:
>> > >>   - connection-type: postgres
>> > >>     hook-class-name:
>> > airflow.providers.postgres.hooks.postgres.PostgresHook
>> > >>
>> > >>     ui-field-behaviour:
>> > >>       hidden-fields: []
>> > >>       relabeling:
>> > >>         schema: "Database"
>> > >>
>> > >>     conn-fields:
>> > >>       sslmode:
>> > >>         type: string
>> > >>         label: SSL Mode
>> > >>         enum: ["disable", "prefer", "require"]
>> > >>         default: "prefer"
>> > >>
>> > >>       timeout:
>> > >>         type: integer
>> > >>         label: Timeout
>> > >>         range: [1, 300]
>> > >>         default: 30
>> > >>
>> > >> The schema will now consist of two new sections:
>> > >>
>> > >> 1. ui-field-behaviour
>> > >> - Used to customize the standard connection fields (host, port,
>> login,
>> > etc.)
>> > >> - hidden-fields: Hide some fields
>> > >> - relabeling: Change labels for some fields (like schema -> Database
>> > above)
>> > >> - placeholders: Show hints in the form (port 5432 for example)
>> > >>
>> > >> 2. conn-fields
>> > >> - Can be used to define custom fields stored in Connection.extra
>> > >> - You can define inline validators like enum, range, pattern,
>> > min-length,
>> > >> max-length
>> > >> - Will support the standard wtforms string, integer, boolean, number
>> > types
>> > >>
>> > >> As for why this schema was chosen, check the comparison with
>> > alternative in
>> > >> the PR
>> > >> desc:https://github.com/apache/airflow/pull/60410
>> > >>
>> > >>
>> > >> Current Status
>> > >>
>> > >> I have a POC in:https://github.com/apache/airflow/pull/60410 where I
>> > chose
>> > >> two pilot providers of
>> > >> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla form
>> but
>> > >> SMTP has some hidden fields).
>> > >>
>> > >>
>> > >> Benefits this will offer
>> > >>
>> > >> - Once complete, the API server won't import any hook classes for UI
>> > >> rendering leading to faster startup
>> > >> - Provider dependencies don't affect API server
>> > >> - YAML is easier to read/write than python functions for form
>> metadata
>> > >>
>> > >> Would love feedback on:
>> > >> 1. Schema design - does it cover your use cases?
>> > >> 2. Any missing field types or validators?
>> > >>
>> > >> The goal is to get the pilot providers in so we can start migrating
>> > >> providers incrementally. Old way still
>> > >> works, so no rush for everyone to migrate at once.
>> > >>
>> > >> Thoughts?
>> > >>
>> > >> Thanks & Regards,
>> > >> Amogh Desai
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail:[email protected]
>> > > For additional commands, e-mail:[email protected]
>> > >
>>
>

Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Reply via email to