Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Amogh Desai Fri, 16 Jan 2026 02:14:15 -0800

Thanks for the suggestion for using jsonschema!

I updated the implementation to use jsonschema instead of the custom
format. Now the structure looks like this for example:


conn-fields:
  timeout:
    label: "Connection Timeout"
    description: "Timeout in seconds"
    schema:
      type: integer
      minimum: 1
      maximum: 300
      default: 30

As for the concerns regarding GCP (14 fields including string, int,
boolean, and password), I tested it and it
works well (updated on PR). The code now uses schema object for all
jsonschema validation properties like min, max, pattern,
enum, etc while keeping UI metadata like label, description, sensitive or
not at the top level. This aligns
better with the react UI which already expects this format.

Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]> wrote:

> Ash -
>
> Good catch on the GCP concern. I checked it and this is what it uses:
>
>     @classmethod
>     def get_connection_form_widgets(cls) -> dict[str, Any]:
>         """Return connection widgets to add to connection form."""
>         from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget,
> BS3TextFieldWidget
>         from flask_babel import lazy_gettext
>         from wtforms import BooleanField, IntegerField, PasswordField,
> StringField
>         from wtforms.validators import NumberRange
>
>         return {
>             "project": StringField(lazy_gettext("Project Id"),
> widget=BS3TextFieldWidget()),
>             "key_path": StringField(lazy_gettext("Keyfile Path"),
> widget=BS3TextFieldWidget()),
>             "keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"),
> widget=BS3PasswordFieldWidget()),
>             "credential_config_file": StringField(
>                 lazy_gettext("Credential Configuration File"),
> widget=BS3TextFieldWidget()
>             ),
>             "scope": StringField(lazy_gettext("Scopes (comma separated)"),
> widget=BS3TextFieldWidget()),
>             "key_secret_name": StringField(
>                 lazy_gettext("Keyfile Secret Name (in GCP Secret
> Manager)"), widget=BS3TextFieldWidget()
>             ),
>             "key_secret_project_id": StringField(
>                 lazy_gettext("Keyfile Secret Project Id (in GCP Secret
> Manager)"), widget=BS3TextFieldWidget()
>             ),
>             "num_retries": IntegerField(
>                 lazy_gettext("Number of Retries"),
>                 validators=[NumberRange(min=0)],
>                 widget=BS3TextFieldWidget(),
>                 default=5,
>             ),
>             "impersonation_chain": StringField(
>                 lazy_gettext("Impersonation Chain"),
> widget=BS3TextFieldWidget()
>             ),
>             "idp_issuer_url": StringField(
>                 lazy_gettext("IdP Token Issue URL (Client Credentials
> Grant Flow)"),
>                 widget=BS3TextFieldWidget(),
>             ),
>             "client_id": StringField(
>                 lazy_gettext("Client ID (Client Credentials Grant Flow)"),
> widget=BS3TextFieldWidget()
>             ),
>             "client_secret": StringField(
>                 lazy_gettext("Client Secret (Client Credentials Grant
> Flow)"),
>                 widget=BS3PasswordFieldWidget(),
>             ),
>             "idp_extra_parameters": StringField(
>                 lazy_gettext("IdP Extra Request Parameters"),
> widget=BS3TextFieldWidget()
>             ),
>             "is_anonymous": BooleanField(
>                 lazy_gettext("Anonymous credentials (ignores all other
> settings)"), default=False
>             ),
>         }
>
>     @classmethod
>     def get_ui_field_behaviour(cls) -> dict[str, Any]:
>         """Return custom field behaviour."""
>         return {
>             "hidden_fields": ["host", "schema", "login", "password",
> "port", "extra"],
>             "relabeling": {},
>         }
>
> All of these are covered by my schema.
>
> Also checked what the react UI supports and:
>
> I checked what the react UI supports as of now and this is what I found:
>
> string - Text input
> integer - Number input
> number - Number input
> boolean - Checkbox
> object - JSON object editor
> array - Array input
>
> String Formats:
> format: "password" - Masked password field
> format: "multiline" - Textarea
> format: "date" - Date picker
> format: "date-time" - DateTime picker
> format: "time" - Time picker
>
> Array Types
>
> This all comes from a field selector logic:
> https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92
> .
>
> Fields are selected based on
> - `schema.type` (string, integer, boolean, array, object)
> - `schema.format` (password, multiline, date, date-time, time, email, url)
> - `schema.enum` (if present, dropdown select)
>
> So essentially anything with a type, format, and enum defined can be
> handled by react UI. That said, maybe I should
> try and adopt using jsonschema format here.
>
> Thanks & Regards,
> Amogh Desai
>
>
> On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]>
> wrote:
>
>> Jarek -
>>
>> Re backcompat, yeah, I already have the fallback in place in my POC. The
>> discovery code
>> will first try to load the metadata from yaml, and if it fails to do so,
>> it will use the *python method*
>> flow to discover the metadata.
>>
>> Re the bigger vision about API servers without providers, I love where
>> you are going with this, but
>> I think we need to split up the tasks because we aren't there yet. Let me
>> explain -
>>
>> Your idea to discover providers via triggerer, store in DB and API server
>> reads from DB might work
>> for connection forms, but there are a few more reasons why API server
>> will continue to need the
>> ProvidersManager:
>>
>> 1. Auth Managers
>> 2. Secrets Backends
>> 3. Providers List Endpoint: maybe we should get rid of this? IDK who the
>> consumer of this endpoint is
>>
>> So the API server without Providers thing is harder than just connection
>> forms and we aren't there yet
>> until we figure out the 3 points from above.
>>
>> I suggest we do this instead:
>>
>> Phase 1: Connection forms from YAML to establish foundation for the future
>> Phase 2: The DB storage phase - decide if Triggerer / who populates in DB
>> (Maybe not triggerer because we do not want it to have DB access
>> eventually)
>>
>> Does that sound reasonable? What do you think?
>>
>>
>> Thanks & Regards,
>> Amogh Desai
>>
>>
>> On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:
>>
>>> > One main thing was assuming that all providers need to be available on
>>> > Scheduler (? I think that changed?) that there the connection form
>>> >  definitons are persisted to DB such that the API server directly can
>>> > read from there - no need to install providers on API Server!
>>>
>>> I think Triggerer is better than Scheduler to persist connection
>>> definition
>>> to the DB. Essentially Triggerer is the only component that needs DB
>>> access
>>> and also needs to have providers installed. Any of the providers might
>>> implement Triggers and they are very tightly coupled with "Hooks" and
>>> "Operators".  Scheduler only really needs **scheduler plugins**
>>> (Timetables
>>> and such) and **executors** (which we eventually want to split-off from
>>> current "worker" providers). It does not need "worker providers".
>>>
>>> IMHO in many discussions of ours this long term plan / vision is most
>>> appealing:
>>>
>>> * api-server: only needs distributions that are "ui plugins" (no
>>> providers)
>>> * scheduler only needs distributions that are "scheduler plugins" (e.g.
>>> timetables) and "executors"
>>> * worker only needs "worker/triggerer providers" (i.e. hooks and
>>> operators
>>> essentially) and "worker plugins" (e.g. macros)
>>> * triggerer only needs "worker/triggerer providers" (as in workers) -
>>> possibly "triggerer plugins" if we ever have a need to have them
>>>
>>> Eventually, optionally, each of those should ("api-server", "scheduler",
>>> "worker", "triggerer") should be a separate distribution. Each with its
>>> own
>>> dependencies. But this one only makes sense if we find that those
>>> dependencies could be very different between those - it's likely this
>>> will
>>> not happen, because dependency-set for each of those "components" will be
>>> very close. when we finalize the current task-sdk isolation work.
>>>
>>> Of course we cannot do it all at once and it will take quite some time to
>>> get there.
>>>
>>> But I think we should have it as a "North Star" that we should look at
>>> when
>>> we make any "architecture" decisions.  And every decision we make should
>>> bring us closer to this "North Star".
>>>
>>> Also - just to note - with the "shared" libraries concept we already
>>> have,
>>> and with "uv workspace" in our monorepo - we have ALL the mechanisms
>>> needed
>>> to make it happen. And to do it in a very maintainable way with very
>>> little
>>> overhead and virtually no change in regular development workflow. For
>>> example the shared libraries concept might be used to share common code
>>> for
>>> both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
>>> essentially - installable for worker and triggerer) and (future)
>>> apache-airflow-executors-cncf-kubernetes (executor installable for
>>> scheduler). Same for amazon worker provider/executor split and edge
>>> worker
>>> provider/executor split. All that is doable.
>>>
>>> J.
>>>
>>>
>>>
>>> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
>>> wrote:
>>>
>>> > Also +100 from my side.
>>> >
>>> > We discussed exactly this in a Airflow 3 dev call, I was looking for
>>> the
>>> > notes... that was when we discussed about the component split in the
>>> > future. Found a reference in
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024
>>> >
>>> > ```
>>> >
>>> > **Plan for Decoupling Providers's Connections metadata from FAB (Jens
>>> > Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**
>>> >
>>> >   * Jens created this draft PR
>>> >     <https://github.com/apache/airflow/pull/41656> with the POC for it
>>> >     and presented it on the call.
>>> >   * Jarek <https://cwiki.apache.org/confluence/display/~potiuk>
>>> proposed
>>> >     the idea of dumping the JSON/YAML with connection fields in the
>>> >     Database or loading it via package metadata so we don't load all
>>> the
>>> >     dependencies on the webserver.
>>> >   * We will need some plan for external providers on how they can
>>> define
>>> >     connections or register them.
>>> >   * The POC successfully proved that we can separate the connection
>>> >     metadata from FAB
>>> >   * /*Action Item*/: Jens
>>> >     <https://cwiki.apache.org/confluence/display/~jscheffl> to create
>>> a
>>> >     GitHub issue for decoupling the Connection metadata from FAB
>>> >
>>> > ```
>>> >
>>> > Also on Sep 19th 2024 we had an overview which pieces of the providers
>>> > are needed where:
>>> >
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024
>>> >
>>> > Follow-up was notes in Github ticket:
>>> > https://github.com/apache/airflow/issues/42016
>>> >
>>> >
>>> > One main thing was assuming that all providers need to be available on
>>> > Scheduler (? I think that changed?) that there the connection form
>>> > definitons are persisted to DB such that the API server directly can
>>> > read from there - no need to install providers on API Server!
>>> >
>>> > Looking forward for the contribution... I assume no VOTE needed :-D
>>> >
>>> > Jens
>>> >
>>> > On 1/15/26 15:52, Ash Berlin-Taylor wrote:
>>> > > As an idea/structure I think its certainly the right way to go — not
>>> > needing the code, not the instantiated widget classes, to (I suspect)
>>> throw
>>> > them away in the new React UI certainly seems like a silly idea now.
>>> > >
>>> > > In your POC I don’t think you have got the ability to have the extra
>>> > fields that, for instance, Google Cloud connection has yet though.
>>> > >
>>> > > As for the schema we need to express: I’d say we should look at what
>>> the
>>> > react UI currently supports?
>>> > >
>>> > > -ash
>>> > >
>>> > >> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]> wrote:
>>> > >>
>>> > >> Hi All,
>>> > >>
>>> > >> I wanted to get feedback on something I have been twiddling with.
>>> For
>>> > >> context, the API server has to import
>>> > >> every single hook class from all providers just to render connection
>>> > forms
>>> > >> in the UI. This is because the UI
>>> > >> metadata (what fields to show, labels, validators, etc.) are living
>>> in
>>> > >> python functions like `get_connection_form_widgets()`
>>> > >> and `get_ui_field_behaviour()` which are defined on the hook
>>> classes.
>>> > >>
>>> > >> This means:
>>> > >> - API server startup imports 100+ hook classes it might not actually
>>> > need
>>> > >> - Slower startup due to heavier memory footprint
>>> > >> - Poor client-server separation (why does the API server need to
>>> know
>>> > about
>>> > >> pyodbc just to show a UI form?)
>>> > >>
>>> > >> My proposal
>>> > >>
>>> > >> Moving the UI metadata from python code to something static /
>>> > declarative
>>> > >> like yaml. I want to add this information
>>> > >> in the provider.yaml file that every provider already has. For
>>> example -
>>> > >>
>>> > >> class PostgresHook(BaseHook):
>>> > >>     @classmethod
>>> > >>     def get_ui_field_behaviour(cls) -> dict[str, Any]:
>>> > >>         return {
>>> > >>             "hidden_fields": [],
>>> > >>             "relabeling": {
>>> > >>                 "schema": "Database",
>>> > >>             },
>>> > >>         }
>>> > >>
>>> > >> Will become:
>>> > >>
>>> > >> connection-types:
>>> > >>   - connection-type: postgres
>>> > >>     hook-class-name:
>>> > airflow.providers.postgres.hooks.postgres.PostgresHook
>>> > >>
>>> > >>     ui-field-behaviour:
>>> > >>       hidden-fields: []
>>> > >>       relabeling:
>>> > >>         schema: "Database"
>>> > >>
>>> > >>     conn-fields:
>>> > >>       sslmode:
>>> > >>         type: string
>>> > >>         label: SSL Mode
>>> > >>         enum: ["disable", "prefer", "require"]
>>> > >>         default: "prefer"
>>> > >>
>>> > >>       timeout:
>>> > >>         type: integer
>>> > >>         label: Timeout
>>> > >>         range: [1, 300]
>>> > >>         default: 30
>>> > >>
>>> > >> The schema will now consist of two new sections:
>>> > >>
>>> > >> 1. ui-field-behaviour
>>> > >> - Used to customize the standard connection fields (host, port,
>>> login,
>>> > etc.)
>>> > >> - hidden-fields: Hide some fields
>>> > >> - relabeling: Change labels for some fields (like schema -> Database
>>> > above)
>>> > >> - placeholders: Show hints in the form (port 5432 for example)
>>> > >>
>>> > >> 2. conn-fields
>>> > >> - Can be used to define custom fields stored in Connection.extra
>>> > >> - You can define inline validators like enum, range, pattern,
>>> > min-length,
>>> > >> max-length
>>> > >> - Will support the standard wtforms string, integer, boolean, number
>>> > types
>>> > >>
>>> > >> As for why this schema was chosen, check the comparison with
>>> > alternative in
>>> > >> the PR
>>> > >> desc:https://github.com/apache/airflow/pull/60410
>>> > >>
>>> > >>
>>> > >> Current Status
>>> > >>
>>> > >> I have a POC in:https://github.com/apache/airflow/pull/60410 where
>>> I
>>> > chose
>>> > >> two pilot providers of
>>> > >> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla
>>> form but
>>> > >> SMTP has some hidden fields).
>>> > >>
>>> > >>
>>> > >> Benefits this will offer
>>> > >>
>>> > >> - Once complete, the API server won't import any hook classes for UI
>>> > >> rendering leading to faster startup
>>> > >> - Provider dependencies don't affect API server
>>> > >> - YAML is easier to read/write than python functions for form
>>> metadata
>>> > >>
>>> > >> Would love feedback on:
>>> > >> 1. Schema design - does it cover your use cases?
>>> > >> 2. Any missing field types or validators?
>>> > >>
>>> > >> The goal is to get the pilot providers in so we can start migrating
>>> > >> providers incrementally. Old way still
>>> > >> works, so no rush for everyone to migrate at once.
>>> > >>
>>> > >> Thoughts?
>>> > >>
>>> > >> Thanks & Regards,
>>> > >> Amogh Desai
>>> > >
>>> > > ---------------------------------------------------------------------
>>> > > To unsubscribe, e-mail:[email protected]
>>> > > For additional commands, e-mail:[email protected]
>>> > >
>>>
>>

Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Reply via email to