Thanks for the suggestion for using jsonschema!
I updated the implementation to use jsonschema instead of the custom
format. Now the structure looks like this for example:
conn-fields:
timeout:
label: "Connection Timeout"
description: "Timeout in seconds"
schema:
type: integer
minimum: 1
maximum: 300
default: 30
As for the concerns regarding GCP (14 fields including string, int,
boolean, and password), I tested it and it
works well (updated on PR). The code now uses schema object for all
jsonschema validation properties like min, max, pattern,
enum, etc while keeping UI metadata like label, description, sensitive or
not at the top level. This aligns
better with the react UI which already expects this format.
Thanks & Regards,
Amogh Desai
On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]>
wrote:
Ash -
Good catch on the GCP concern. I checked it and this is what it uses:
@classmethod
def get_connection_form_widgets(cls) -> dict[str, Any]:
"""Return connection widgets to add to connection form."""
from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget,
BS3TextFieldWidget
from flask_babel import lazy_gettext
from wtforms import BooleanField, IntegerField, PasswordField,
StringField
from wtforms.validators import NumberRange
return {
"project": StringField(lazy_gettext("Project Id"),
widget=BS3TextFieldWidget()),
"key_path": StringField(lazy_gettext("Keyfile Path"),
widget=BS3TextFieldWidget()),
"keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"),
widget=BS3PasswordFieldWidget()),
"credential_config_file": StringField(
lazy_gettext("Credential Configuration File"),
widget=BS3TextFieldWidget()
),
"scope": StringField(lazy_gettext("Scopes (comma
separated)"),
widget=BS3TextFieldWidget()),
"key_secret_name": StringField(
lazy_gettext("Keyfile Secret Name (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
),
"key_secret_project_id": StringField(
lazy_gettext("Keyfile Secret Project Id (in GCP Secret
Manager)"), widget=BS3TextFieldWidget()
),
"num_retries": IntegerField(
lazy_gettext("Number of Retries"),
validators=[NumberRange(min=0)],
widget=BS3TextFieldWidget(),
default=5,
),
"impersonation_chain": StringField(
lazy_gettext("Impersonation Chain"),
widget=BS3TextFieldWidget()
),
"idp_issuer_url": StringField(
lazy_gettext("IdP Token Issue URL (Client Credentials
Grant Flow)"),
widget=BS3TextFieldWidget(),
),
"client_id": StringField(
lazy_gettext("Client ID (Client Credentials Grant
Flow)"),
widget=BS3TextFieldWidget()
),
"client_secret": StringField(
lazy_gettext("Client Secret (Client Credentials Grant
Flow)"),
widget=BS3PasswordFieldWidget(),
),
"idp_extra_parameters": StringField(
lazy_gettext("IdP Extra Request Parameters"),
widget=BS3TextFieldWidget()
),
"is_anonymous": BooleanField(
lazy_gettext("Anonymous credentials (ignores all other
settings)"), default=False
),
}
@classmethod
def get_ui_field_behaviour(cls) -> dict[str, Any]:
"""Return custom field behaviour."""
return {
"hidden_fields": ["host", "schema", "login", "password",
"port", "extra"],
"relabeling": {},
}
All of these are covered by my schema.
Also checked what the react UI supports and:
I checked what the react UI supports as of now and this is what I found:
string - Text input
integer - Number input
number - Number input
boolean - Checkbox
object - JSON object editor
array - Array input
String Formats:
format: "password" - Masked password field
format: "multiline" - Textarea
format: "date" - Date picker
format: "date-time" - DateTime picker
format: "time" - Time picker
Array Types
This all comes from a field selector logic:
https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92
.
Fields are selected based on
- `schema.type` (string, integer, boolean, array, object)
- `schema.format` (password, multiline, date, date-time, time, email,
url)
- `schema.enum` (if present, dropdown select)
So essentially anything with a type, format, and enum defined can be
handled by react UI. That said, maybe I should
try and adopt using jsonschema format here.
Thanks & Regards,
Amogh Desai
On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]>
wrote:
Jarek -
Re backcompat, yeah, I already have the fallback in place in my POC. The
discovery code
will first try to load the metadata from yaml, and if it fails to do so,
it will use the *python method*
flow to discover the metadata.
Re the bigger vision about API servers without providers, I love where
you are going with this, but
I think we need to split up the tasks because we aren't there yet. Let
me
explain -
Your idea to discover providers via triggerer, store in DB and API
server
reads from DB might work
for connection forms, but there are a few more reasons why API server
will continue to need the
ProvidersManager:
1. Auth Managers
2. Secrets Backends
3. Providers List Endpoint: maybe we should get rid of this? IDK who the
consumer of this endpoint is
So the API server without Providers thing is harder than just connection
forms and we aren't there yet
until we figure out the 3 points from above.
I suggest we do this instead:
Phase 1: Connection forms from YAML to establish foundation for the
future
Phase 2: The DB storage phase - decide if Triggerer / who populates in
DB
(Maybe not triggerer because we do not want it to have DB access
eventually)
Does that sound reasonable? What do you think?
Thanks & Regards,
Amogh Desai
On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote:
One main thing was assuming that all providers need to be available
on
Scheduler (? I think that changed?) that there the connection form
definitons are persisted to DB such that the API server directly can
read from there - no need to install providers on API Server!
I think Triggerer is better than Scheduler to persist connection
definition
to the DB. Essentially Triggerer is the only component that needs DB
access
and also needs to have providers installed. Any of the providers might
implement Triggers and they are very tightly coupled with "Hooks" and
"Operators". Scheduler only really needs **scheduler plugins**
(Timetables
and such) and **executors** (which we eventually want to split-off from
current "worker" providers). It does not need "worker providers".
IMHO in many discussions of ours this long term plan / vision is most
appealing:
* api-server: only needs distributions that are "ui plugins" (no
providers)
* scheduler only needs distributions that are "scheduler plugins" (e.g.
timetables) and "executors"
* worker only needs "worker/triggerer providers" (i.e. hooks and
operators
essentially) and "worker plugins" (e.g. macros)
* triggerer only needs "worker/triggerer providers" (as in workers) -
possibly "triggerer plugins" if we ever have a need to have them
Eventually, optionally, each of those should ("api-server",
"scheduler",
"worker", "triggerer") should be a separate distribution. Each with its
own
dependencies. But this one only makes sense if we find that those
dependencies could be very different between those - it's likely this
will
not happen, because dependency-set for each of those "components" will
be
very close. when we finalize the current task-sdk isolation work.
Of course we cannot do it all at once and it will take quite some time
to
get there.
But I think we should have it as a "North Star" that we should look at
when
we make any "architecture" decisions. And every decision we make
should
bring us closer to this "North Star".
Also - just to note - with the "shared" libraries concept we already
have,
and with "uv workspace" in our monorepo - we have ALL the mechanisms
needed
to make it happen. And to do it in a very maintainable way with very
little
overhead and virtually no change in regular development workflow. For
example the shared libraries concept might be used to share common code
for
both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO
essentially - installable for worker and triggerer) and (future)
apache-airflow-executors-cncf-kubernetes (executor installable for
scheduler). Same for amazon worker provider/executor split and edge
worker
provider/executor split. All that is doable.
J.
On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]>
wrote:
Also +100 from my side.
We discussed exactly this in a Airflow 3 dev call, I was looking for
the
notes... that was when we discussed about the component split in the
future. Found a reference in
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024
```
**Plan for Decoupling Providers's Connections metadata from FAB (Jens
Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)**
* Jens created this draft PR
<https://github.com/apache/airflow/pull/41656> with the POC for
it
and presented it on the call.
* Jarek <https://cwiki.apache.org/confluence/display/~potiuk>
proposed
the idea of dumping the JSON/YAML with connection fields in the
Database or loading it via package metadata so we don't load all
the
dependencies on the webserver.
* We will need some plan for external providers on how they can
define
connections or register them.
* The POC successfully proved that we can separate the connection
metadata from FAB
* /*Action Item*/: Jens
<https://cwiki.apache.org/confluence/display/~jscheffl> to
create
a
GitHub issue for decoupling the Connection metadata from FAB
```
Also on Sep 19th 2024 we had an overview which pieces of the
providers
are needed where:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024
Follow-up was notes in Github ticket:
https://github.com/apache/airflow/issues/42016
One main thing was assuming that all providers need to be available
on
Scheduler (? I think that changed?) that there the connection form
definitons are persisted to DB such that the API server directly can
read from there - no need to install providers on API Server!
Looking forward for the contribution... I assume no VOTE needed :-D
Jens
On 1/15/26 15:52, Ash Berlin-Taylor wrote:
As an idea/structure I think its certainly the right way to go —
not
needing the code, not the instantiated widget classes, to (I suspect)
throw
them away in the new React UI certainly seems like a silly idea now.
In your POC I don’t think you have got the ability to have the
extra
fields that, for instance, Google Cloud connection has yet though.
As for the schema we need to express: I’d say we should look at
what
the
react UI currently supports?
-ash
On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]>
wrote:
Hi All,
I wanted to get feedback on something I have been twiddling with.
For
context, the API server has to import
every single hook class from all providers just to render
connection
forms
in the UI. This is because the UI
metadata (what fields to show, labels, validators, etc.) are
living
in
python functions like `get_connection_form_widgets()`
and `get_ui_field_behaviour()` which are defined on the hook
classes.
This means:
- API server startup imports 100+ hook classes it might not
actually
need
- Slower startup due to heavier memory footprint
- Poor client-server separation (why does the API server need to
know
about
pyodbc just to show a UI form?)
My proposal
Moving the UI metadata from python code to something static /
declarative
like yaml. I want to add this information
in the provider.yaml file that every provider already has. For
example -
class PostgresHook(BaseHook):
@classmethod
def get_ui_field_behaviour(cls) -> dict[str, Any]:
return {
"hidden_fields": [],
"relabeling": {
"schema": "Database",
},
}
Will become:
connection-types:
- connection-type: postgres
hook-class-name:
airflow.providers.postgres.hooks.postgres.PostgresHook
ui-field-behaviour:
hidden-fields: []
relabeling:
schema: "Database"
conn-fields:
sslmode:
type: string
label: SSL Mode
enum: ["disable", "prefer", "require"]
default: "prefer"
timeout:
type: integer
label: Timeout
range: [1, 300]
default: 30
The schema will now consist of two new sections:
1. ui-field-behaviour
- Used to customize the standard connection fields (host, port,
login,
etc.)
- hidden-fields: Hide some fields
- relabeling: Change labels for some fields (like schema ->
Database
above)
- placeholders: Show hints in the form (port 5432 for example)
2. conn-fields
- Can be used to define custom fields stored in Connection.extra
- You can define inline validators like enum, range, pattern,
min-length,
max-length
- Will support the standard wtforms string, integer, boolean,
number
types
As for why this schema was chosen, check the comparison with
alternative in
the PR
desc:https://github.com/apache/airflow/pull/60410
Current Status
I have a POC in:https://github.com/apache/airflow/pull/60410
where
I
chose
two pilot providers of
varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla
form but
SMTP has some hidden fields).
Benefits this will offer
- Once complete, the API server won't import any hook classes for
UI
rendering leading to faster startup
- Provider dependencies don't affect API server
- YAML is easier to read/write than python functions for form
metadata
Would love feedback on:
1. Schema design - does it cover your use cases?
2. Any missing field types or validators?
The goal is to get the pilot providers in so we can start
migrating
providers incrementally. Old way still
works, so no rush for everyone to migrate at once.
Thoughts?
Thanks & Regards,
Amogh Desai
---------------------------------------------------------------------
To unsubscribe, e-mail:[email protected]
For additional commands, e-mail:[email protected]