Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Jens Scheffler Fri, 16 Jan 2026 11:53:08 -0800

+100 still - especially on the JSON schema thing.

JSON schema was once decided to be the base of Params and the very firstAIP-50 trigger form built on it and such evolved the todays trigger UIas well. All is internally transferred as JSON Schema. So great that youcatched-up on this, a custom schema would have been bad. Also thisallows for future extension and added validation which we might notsupport today in Trigger Form - can be plugged with more features in thefuture.


On 16.01.26 15:23, Jarek Potiuk wrote:

There are a few more reasons why API server will continue to need the

ProvidersManager:

Yeah, I was aware we likely have a few more things I forgot, but this idea
extends to those nicely:

1. Auth Managers -> I consider this as an api-server plugin :), or possibly
separate (apache-airlfow-auth-manager) type of distribution (again this
will work nicely with "shared" library")
2. Secrets Backends -> not sure if that is needed for api-server (maybe
just for configuration retrieval? ) this again can be a plugin - or
separate (apache-airflow-secrets-backend)
3. Providers List Endpoint: maybe we should get rid of this?  > Eventually
this should be part of the same Triggerer DB storage - > triggerer
should store in the DB list of providers installed - already what we
currently have in api-server is kinda wrong - because even now potentially
we can have different providers installed on api-server and different in
workers/triggers - and only those installed in api-server will show up,
swtiching it to reading from DB that will be updated by Triggerrer (also
including team_id as there might be different sets of providers for
different teams) - will make it "correct" (eventually).

But Yeah. We definitely can defer any of that to be done later, if we do
not find it "easier" to do it together - absolutely no pressure there, just
wanted to make sure the "North star" is quite commonly agreed, so that we
know where we are going :). We can definitely proceed with the current POC
"as is"

J.


On Fri, Jan 16, 2026 at 11:11 AM Amogh Desai <[email protected]> wrote:

Thanks for the suggestion for using jsonschema!

I updated the implementation to use jsonschema instead of the custom
format. Now the structure looks like this for example:

conn-fields:
   timeout:
     label: "Connection Timeout"
     description: "Timeout in seconds"
     schema:
       type: integer
       minimum: 1
       maximum: 300
       default: 30

As for the concerns regarding GCP (14 fields including string, int,
boolean, and password), I tested it and it
works well (updated on PR). The code now uses schema object for all
jsonschema validation properties like min, max, pattern,
enum, etc while keeping UI metadata like label, description, sensitive or
not at the top level. This aligns
better with the react UI which already expects this format.

Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]>
wrote:

Ash -

Good catch on the GCP concern. I checked it and this is what it uses:

     @classmethod
     def get_connection_form_widgets(cls) -> dict[str, Any]:
         """Return connection widgets to add to connection form."""
         from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget,
BS3TextFieldWidget
         from flask_babel import lazy_gettext
         from wtforms import BooleanField, IntegerField, PasswordField,
StringField
         from wtforms.validators import NumberRange

         return {
             "project": StringField(lazy_gettext("Project Id"),
widget=BS3TextFieldWidget()),
             "key_path": StringField(lazy_gettext("Keyfile Path"),
widget=BS3TextFieldWidget()),
             "keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"),
widget=BS3PasswordFieldWidget()),
             "credential_config_file": StringField(
                 lazy_gettext("Credential Configuration File"),
widget=BS3TextFieldWidget()
             ),
             "scope": StringField(lazy_gettext("Scopes (comma

separated)"),

widget=BS3TextFieldWidget()),
             "key_secret_name": StringField(
                 lazy_gettext("Keyfile Secret Name (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
             ),
             "key_secret_project_id": StringField(
                 lazy_gettext("Keyfile Secret Project Id (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
             ),
             "num_retries": IntegerField(
                 lazy_gettext("Number of Retries"),
                 validators=[NumberRange(min=0)],
                 widget=BS3TextFieldWidget(),
                 default=5,
             ),
             "impersonation_chain": StringField(
                 lazy_gettext("Impersonation Chain"),
widget=BS3TextFieldWidget()
             ),
             "idp_issuer_url": StringField(
                 lazy_gettext("IdP Token Issue URL (Client Credentials
Grant Flow)"),
                 widget=BS3TextFieldWidget(),
             ),
             "client_id": StringField(
                 lazy_gettext("Client ID (Client Credentials Grant

Flow)"),

widget=BS3TextFieldWidget()
             ),
             "client_secret": StringField(
                 lazy_gettext("Client Secret (Client Credentials Grant
Flow)"),
                 widget=BS3PasswordFieldWidget(),
             ),
             "idp_extra_parameters": StringField(
                 lazy_gettext("IdP Extra Request Parameters"),
widget=BS3TextFieldWidget()
             ),
             "is_anonymous": BooleanField(
                 lazy_gettext("Anonymous credentials (ignores all other
settings)"), default=False
             ),
         }

     @classmethod
     def get_ui_field_behaviour(cls) -> dict[str, Any]:
         """Return custom field behaviour."""
         return {
             "hidden_fields": ["host", "schema", "login", "password",
"port", "extra"],
             "relabeling": {},
         }

All of these are covered by my schema.

Also checked what the react UI supports and:

I checked what the react UI supports as of now and this is what I found:

string - Text input
integer - Number input
number - Number input
boolean - Checkbox
object - JSON object editor
array - Array input

String Formats:
format: "password" - Masked password field
format: "multiline" - Textarea
format: "date" - Date picker
format: "date-time" - DateTime picker
format: "time" - Time picker

Array Types

This all comes from a field selector logic:

https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92

.

Fields are selected based on
- `schema.type` (string, integer, boolean, array, object)
- `schema.format` (password, multiline, date, date-time, time, email,

url)

- `schema.enum` (if present, dropdown select)

So essentially anything with a type, format, and enum defined can be
handled by react UI. That said, maybe I should
try and adopt using jsonschema format here.

Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]>
wrote:

Jarek -

Re backcompat, yeah, I already have the fallback in place in my POC. The
discovery code
will first try to load the metadata from yaml, and if it fails to do so,
it will use the *python method*
flow to discover the metadata.

Re the bigger vision about API servers without providers, I love where
you are going with this, but
I think we need to split up the tasks because we aren't there yet. Let

me

explain -

Your idea to discover providers via triggerer, store in DB and API

server

reads from DB might work
for connection forms, but there are a few more reasons why API server
will continue to need the
ProvidersManager:

1. Auth Managers
2. Secrets Backends
3. Providers List Endpoint: maybe we should get rid of this? IDK who the
consumer of this endpoint is

So the API server without Providers thing is harder than just connection
forms and we aren't there yet
until we figure out the 3 points from above.

I suggest we do this instead:

Phase 1: Connection forms from YAML to establish foundation for the

future

Phase 2: The DB storage phase - decide if Triggerer / who populates in

DB

(Maybe not triggerer because we do not want it to have DB access
eventually)

Does that sound reasonable? What do you think?


Thanks & Regards,
Amogh Desai


On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:

One main thing was assuming that all providers need to be available

on

Scheduler (? I think that changed?) that there the connection form
  definitons are persisted to DB such that the API server directly can
read from there - no need to install providers on API Server!

I think Triggerer is better than Scheduler to persist connection
definition
to the DB. Essentially Triggerer is the only component that needs DB
access
and also needs to have providers installed. Any of the providers might
implement Triggers and they are very tightly coupled with "Hooks" and
"Operators".  Scheduler only really needs **scheduler plugins**
(Timetables
and such) and **executors** (which we eventually want to split-off from
current "worker" providers). It does not need "worker providers".

IMHO in many discussions of ours this long term plan / vision is most
appealing:

* api-server: only needs distributions that are "ui plugins" (no
providers)
* scheduler only needs distributions that are "scheduler plugins" (e.g.
timetables) and "executors"
* worker only needs "worker/triggerer providers" (i.e. hooks and
operators
essentially) and "worker plugins" (e.g. macros)
* triggerer only needs "worker/triggerer providers" (as in workers) -
possibly "triggerer plugins" if we ever have a need to have them

Eventually, optionally, each of those should ("api-server",

"scheduler",

"worker", "triggerer") should be a separate distribution. Each with its
own
dependencies. But this one only makes sense if we find that those
dependencies could be very different between those - it's likely this
will
not happen, because dependency-set for each of those "components" will

be

very close. when we finalize the current task-sdk isolation work.

Of course we cannot do it all at once and it will take quite some time

to

get there.

But I think we should have it as a "North Star" that we should look at
when
we make any "architecture" decisions.  And every decision we make

should

bring us closer to this "North Star".

Also - just to note - with the "shared" libraries concept we already
have,
and with "uv workspace" in our monorepo - we have ALL the mechanisms
needed
to make it happen. And to do it in a very maintainable way with very
little
overhead and virtually no change in regular development workflow. For
example the shared libraries concept might be used to share common code
for
both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
essentially - installable for worker and triggerer) and (future)
apache-airflow-executors-cncf-kubernetes (executor installable for
scheduler). Same for amazon worker provider/executor split and edge
worker
provider/executor split. All that is doable.

J.



On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
wrote:

Also +100 from my side.

We discussed exactly this in a Airflow 3 dev call, I was looking for

the

notes... that was when we discussed about the component split in the
future. Found a reference in

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024

```

**Plan for Decoupling Providers's Connections metadata from FAB (Jens
Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**

   * Jens created this draft PR
     <https://github.com/apache/airflow/pull/41656> with the POC for

it

     and presented it on the call.
   * Jarek <https://cwiki.apache.org/confluence/display/~potiuk>

proposed

     the idea of dumping the JSON/YAML with connection fields in the
     Database or loading it via package metadata so we don't load all

the

     dependencies on the webserver.
   * We will need some plan for external providers on how they can

define

     connections or register them.
   * The POC successfully proved that we can separate the connection
     metadata from FAB
   * /*Action Item*/: Jens
     <https://cwiki.apache.org/confluence/display/~jscheffl> to

create

     GitHub issue for decoupling the Connection metadata from FAB

```

Also on Sep 19th 2024 we had an overview which pieces of the

providers

are needed where:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024

Follow-up was notes in Github ticket:
https://github.com/apache/airflow/issues/42016


One main thing was assuming that all providers need to be available

on

Scheduler (? I think that changed?) that there the connection form
definitons are persisted to DB such that the API server directly can
read from there - no need to install providers on API Server!

Looking forward for the contribution... I assume no VOTE needed :-D

Jens

On 1/15/26 15:52, Ash Berlin-Taylor wrote:

As an idea/structure I think its certainly the right way to go —

not

needing the code, not the instantiated widget classes, to (I suspect)

throw

them away in the new React UI certainly seems like a silly idea now.

In your POC I don’t think you have got the ability to have the

extra

fields that, for instance, Google Cloud connection has yet though.

As for the schema we need to express: I’d say we should look at

what

the

react UI currently supports?

-ash

On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]>

wrote:

Hi All,

I wanted to get feedback on something I have been twiddling with.

For

context, the API server has to import
every single hook class from all providers just to render

connection

forms

in the UI. This is because the UI
metadata (what fields to show, labels, validators, etc.) are

living

in

python functions like `get_connection_form_widgets()`
and `get_ui_field_behaviour()` which are defined on the hook

classes.

This means:
- API server startup imports 100+ hook classes it might not

actually

need

- Slower startup due to heavier memory footprint
- Poor client-server separation (why does the API server need to

know

about

pyodbc just to show a UI form?)

My proposal

Moving the UI metadata from python code to something static /

declarative

like yaml. I want to add this information
in the provider.yaml file that every provider already has. For

example -

class PostgresHook(BaseHook):
     @classmethod
     def get_ui_field_behaviour(cls) -> dict[str, Any]:
         return {
             "hidden_fields": [],
             "relabeling": {
                 "schema": "Database",
             },
         }

Will become:

connection-types:
   - connection-type: postgres
     hook-class-name:

airflow.providers.postgres.hooks.postgres.PostgresHook

     ui-field-behaviour:
       hidden-fields: []
       relabeling:
         schema: "Database"

     conn-fields:
       sslmode:
         type: string
         label: SSL Mode
         enum: ["disable", "prefer", "require"]
         default: "prefer"

       timeout:
         type: integer
         label: Timeout
         range: [1, 300]
         default: 30

The schema will now consist of two new sections:

1. ui-field-behaviour
- Used to customize the standard connection fields (host, port,

login,

etc.)

- hidden-fields: Hide some fields
- relabeling: Change labels for some fields (like schema ->

Database

above)

- placeholders: Show hints in the form (port 5432 for example)

2. conn-fields
- Can be used to define custom fields stored in Connection.extra
- You can define inline validators like enum, range, pattern,

min-length,

max-length
- Will support the standard wtforms string, integer, boolean,

number

types

As for why this schema was chosen, check the comparison with

alternative in

the PR
desc:https://github.com/apache/airflow/pull/60410


Current Status

I have a POC in:https://github.com/apache/airflow/pull/60410

where

chose

two pilot providers of
varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla

form but

SMTP has some hidden fields).


Benefits this will offer

- Once complete, the API server won't import any hook classes for

UI

rendering leading to faster startup
- Provider dependencies don't affect API server
- YAML is easier to read/write than python functions for form

metadata

Would love feedback on:
1. Schema design - does it cover your use cases?
2. Any missing field types or validators?

The goal is to get the pilot providers in so we can start

migrating

providers incrementally. Old way still
works, so no rush for everyone to migrate at once.

Thoughts?

Thanks & Regards,
Amogh Desai

---------------------------------------------------------------------

To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PROPOSAL] Moving connection form UI metadata to provider.yaml

Reply via email to