> There are a few more reasons why API server will continue to need the ProvidersManager:
Yeah, I was aware we likely have a few more things I forgot, but this idea extends to those nicely: 1. Auth Managers -> I consider this as an api-server plugin :), or possibly separate (apache-airlfow-auth-manager) type of distribution (again this will work nicely with "shared" library") 2. Secrets Backends -> not sure if that is needed for api-server (maybe just for configuration retrieval? ) this again can be a plugin - or separate (apache-airflow-secrets-backend) 3. Providers List Endpoint: maybe we should get rid of this? > Eventually this should be part of the same Triggerer DB storage - > triggerer should store in the DB list of providers installed - already what we currently have in api-server is kinda wrong - because even now potentially we can have different providers installed on api-server and different in workers/triggers - and only those installed in api-server will show up, swtiching it to reading from DB that will be updated by Triggerrer (also including team_id as there might be different sets of providers for different teams) - will make it "correct" (eventually). But Yeah. We definitely can defer any of that to be done later, if we do not find it "easier" to do it together - absolutely no pressure there, just wanted to make sure the "North star" is quite commonly agreed, so that we know where we are going :). We can definitely proceed with the current POC "as is" J. On Fri, Jan 16, 2026 at 11:11 AM Amogh Desai <[email protected]> wrote: > Thanks for the suggestion for using jsonschema! > > I updated the implementation to use jsonschema instead of the custom > format. Now the structure looks like this for example: > > conn-fields: > timeout: > label: "Connection Timeout" > description: "Timeout in seconds" > schema: > type: integer > minimum: 1 > maximum: 300 > default: 30 > > As for the concerns regarding GCP (14 fields including string, int, > boolean, and password), I tested it and it > works well (updated on PR). The code now uses schema object for all > jsonschema validation properties like min, max, pattern, > enum, etc while keeping UI metadata like label, description, sensitive or > not at the top level. This aligns > better with the react UI which already expects this format. > > Thanks & Regards, > Amogh Desai > > > On Fri, Jan 16, 2026 at 12:50 PM Amogh Desai <[email protected]> > wrote: > > > Ash - > > > > Good catch on the GCP concern. I checked it and this is what it uses: > > > > @classmethod > > def get_connection_form_widgets(cls) -> dict[str, Any]: > > """Return connection widgets to add to connection form.""" > > from flask_appbuilder.fieldwidgets import BS3PasswordFieldWidget, > > BS3TextFieldWidget > > from flask_babel import lazy_gettext > > from wtforms import BooleanField, IntegerField, PasswordField, > > StringField > > from wtforms.validators import NumberRange > > > > return { > > "project": StringField(lazy_gettext("Project Id"), > > widget=BS3TextFieldWidget()), > > "key_path": StringField(lazy_gettext("Keyfile Path"), > > widget=BS3TextFieldWidget()), > > "keyfile_dict": PasswordField(lazy_gettext("Keyfile JSON"), > > widget=BS3PasswordFieldWidget()), > > "credential_config_file": StringField( > > lazy_gettext("Credential Configuration File"), > > widget=BS3TextFieldWidget() > > ), > > "scope": StringField(lazy_gettext("Scopes (comma > separated)"), > > widget=BS3TextFieldWidget()), > > "key_secret_name": StringField( > > lazy_gettext("Keyfile Secret Name (in GCP Secret > > Manager)"), widget=BS3TextFieldWidget() > > ), > > "key_secret_project_id": StringField( > > lazy_gettext("Keyfile Secret Project Id (in GCP Secret > > Manager)"), widget=BS3TextFieldWidget() > > ), > > "num_retries": IntegerField( > > lazy_gettext("Number of Retries"), > > validators=[NumberRange(min=0)], > > widget=BS3TextFieldWidget(), > > default=5, > > ), > > "impersonation_chain": StringField( > > lazy_gettext("Impersonation Chain"), > > widget=BS3TextFieldWidget() > > ), > > "idp_issuer_url": StringField( > > lazy_gettext("IdP Token Issue URL (Client Credentials > > Grant Flow)"), > > widget=BS3TextFieldWidget(), > > ), > > "client_id": StringField( > > lazy_gettext("Client ID (Client Credentials Grant > Flow)"), > > widget=BS3TextFieldWidget() > > ), > > "client_secret": StringField( > > lazy_gettext("Client Secret (Client Credentials Grant > > Flow)"), > > widget=BS3PasswordFieldWidget(), > > ), > > "idp_extra_parameters": StringField( > > lazy_gettext("IdP Extra Request Parameters"), > > widget=BS3TextFieldWidget() > > ), > > "is_anonymous": BooleanField( > > lazy_gettext("Anonymous credentials (ignores all other > > settings)"), default=False > > ), > > } > > > > @classmethod > > def get_ui_field_behaviour(cls) -> dict[str, Any]: > > """Return custom field behaviour.""" > > return { > > "hidden_fields": ["host", "schema", "login", "password", > > "port", "extra"], > > "relabeling": {}, > > } > > > > All of these are covered by my schema. > > > > Also checked what the react UI supports and: > > > > I checked what the react UI supports as of now and this is what I found: > > > > string - Text input > > integer - Number input > > number - Number input > > boolean - Checkbox > > object - JSON object editor > > array - Array input > > > > String Formats: > > format: "password" - Masked password field > > format: "multiline" - Textarea > > format: "date" - Date picker > > format: "date-time" - DateTime picker > > format: "time" - Time picker > > > > Array Types > > > > This all comes from a field selector logic: > > > https://github.com/apache/airflow/blob/main/airflow-core/src/airflow/ui/src/components/FlexibleForm/FieldSelector.tsx#L58-L92 > > . > > > > Fields are selected based on > > - `schema.type` (string, integer, boolean, array, object) > > - `schema.format` (password, multiline, date, date-time, time, email, > url) > > - `schema.enum` (if present, dropdown select) > > > > So essentially anything with a type, format, and enum defined can be > > handled by react UI. That said, maybe I should > > try and adopt using jsonschema format here. > > > > Thanks & Regards, > > Amogh Desai > > > > > > On Fri, Jan 16, 2026 at 12:36 PM Amogh Desai <[email protected]> > > wrote: > > > >> Jarek - > >> > >> Re backcompat, yeah, I already have the fallback in place in my POC. The > >> discovery code > >> will first try to load the metadata from yaml, and if it fails to do so, > >> it will use the *python method* > >> flow to discover the metadata. > >> > >> Re the bigger vision about API servers without providers, I love where > >> you are going with this, but > >> I think we need to split up the tasks because we aren't there yet. Let > me > >> explain - > >> > >> Your idea to discover providers via triggerer, store in DB and API > server > >> reads from DB might work > >> for connection forms, but there are a few more reasons why API server > >> will continue to need the > >> ProvidersManager: > >> > >> 1. Auth Managers > >> 2. Secrets Backends > >> 3. Providers List Endpoint: maybe we should get rid of this? IDK who the > >> consumer of this endpoint is > >> > >> So the API server without Providers thing is harder than just connection > >> forms and we aren't there yet > >> until we figure out the 3 points from above. > >> > >> I suggest we do this instead: > >> > >> Phase 1: Connection forms from YAML to establish foundation for the > future > >> Phase 2: The DB storage phase - decide if Triggerer / who populates in > DB > >> (Maybe not triggerer because we do not want it to have DB access > >> eventually) > >> > >> Does that sound reasonable? What do you think? > >> > >> > >> Thanks & Regards, > >> Amogh Desai > >> > >> > >> On Fri, Jan 16, 2026 at 4:46 AM Jarek Potiuk <[email protected]> wrote: > >> > >>> > One main thing was assuming that all providers need to be available > on > >>> > Scheduler (? I think that changed?) that there the connection form > >>> > definitons are persisted to DB such that the API server directly can > >>> > read from there - no need to install providers on API Server! > >>> > >>> I think Triggerer is better than Scheduler to persist connection > >>> definition > >>> to the DB. Essentially Triggerer is the only component that needs DB > >>> access > >>> and also needs to have providers installed. Any of the providers might > >>> implement Triggers and they are very tightly coupled with "Hooks" and > >>> "Operators". Scheduler only really needs **scheduler plugins** > >>> (Timetables > >>> and such) and **executors** (which we eventually want to split-off from > >>> current "worker" providers). It does not need "worker providers". > >>> > >>> IMHO in many discussions of ours this long term plan / vision is most > >>> appealing: > >>> > >>> * api-server: only needs distributions that are "ui plugins" (no > >>> providers) > >>> * scheduler only needs distributions that are "scheduler plugins" (e.g. > >>> timetables) and "executors" > >>> * worker only needs "worker/triggerer providers" (i.e. hooks and > >>> operators > >>> essentially) and "worker plugins" (e.g. macros) > >>> * triggerer only needs "worker/triggerer providers" (as in workers) - > >>> possibly "triggerer plugins" if we ever have a need to have them > >>> > >>> Eventually, optionally, each of those should ("api-server", > "scheduler", > >>> "worker", "triggerer") should be a separate distribution. Each with its > >>> own > >>> dependencies. But this one only makes sense if we find that those > >>> dependencies could be very different between those - it's likely this > >>> will > >>> not happen, because dependency-set for each of those "components" will > be > >>> very close. when we finalize the current task-sdk isolation work. > >>> > >>> Of course we cannot do it all at once and it will take quite some time > to > >>> get there. > >>> > >>> But I think we should have it as a "North Star" that we should look at > >>> when > >>> we make any "architecture" decisions. And every decision we make > should > >>> bring us closer to this "North Star". > >>> > >>> Also - just to note - with the "shared" libraries concept we already > >>> have, > >>> and with "uv workspace" in our monorepo - we have ALL the mechanisms > >>> needed > >>> to make it happen. And to do it in a very maintainable way with very > >>> little > >>> overhead and virtually no change in regular development workflow. For > >>> example the shared libraries concept might be used to share common code > >>> for > >>> both: apache-airflow-providers-cncf-kubernetes (worker provider - KPO > >>> essentially - installable for worker and triggerer) and (future) > >>> apache-airflow-executors-cncf-kubernetes (executor installable for > >>> scheduler). Same for amazon worker provider/executor split and edge > >>> worker > >>> provider/executor split. All that is doable. > >>> > >>> J. > >>> > >>> > >>> > >>> On Thu, Jan 15, 2026 at 10:23 PM Jens Scheffler <[email protected]> > >>> wrote: > >>> > >>> > Also +100 from my side. > >>> > > >>> > We discussed exactly this in a Airflow 3 dev call, I was looking for > >>> the > >>> > notes... that was when we discussed about the component split in the > >>> > future. Found a reference in > >>> > > >>> > > >>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-22August2024 > >>> > > >>> > ``` > >>> > > >>> > **Plan for Decoupling Providers's Connections metadata from FAB (Jens > >>> > Scheffler <https://cwiki.apache.org/confluence/display/~jscheffl>)** > >>> > > >>> > * Jens created this draft PR > >>> > <https://github.com/apache/airflow/pull/41656> with the POC for > it > >>> > and presented it on the call. > >>> > * Jarek <https://cwiki.apache.org/confluence/display/~potiuk> > >>> proposed > >>> > the idea of dumping the JSON/YAML with connection fields in the > >>> > Database or loading it via package metadata so we don't load all > >>> the > >>> > dependencies on the webserver. > >>> > * We will need some plan for external providers on how they can > >>> define > >>> > connections or register them. > >>> > * The POC successfully proved that we can separate the connection > >>> > metadata from FAB > >>> > * /*Action Item*/: Jens > >>> > <https://cwiki.apache.org/confluence/display/~jscheffl> to > create > >>> a > >>> > GitHub issue for decoupling the Connection metadata from FAB > >>> > > >>> > ``` > >>> > > >>> > Also on Sep 19th 2024 we had an overview which pieces of the > providers > >>> > are needed where: > >>> > > >>> > > >>> > > >>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=308153072#Airflow3Devcall:MeetingNotes-19September2024 > >>> > > >>> > Follow-up was notes in Github ticket: > >>> > https://github.com/apache/airflow/issues/42016 > >>> > > >>> > > >>> > One main thing was assuming that all providers need to be available > on > >>> > Scheduler (? I think that changed?) that there the connection form > >>> > definitons are persisted to DB such that the API server directly can > >>> > read from there - no need to install providers on API Server! > >>> > > >>> > Looking forward for the contribution... I assume no VOTE needed :-D > >>> > > >>> > Jens > >>> > > >>> > On 1/15/26 15:52, Ash Berlin-Taylor wrote: > >>> > > As an idea/structure I think its certainly the right way to go — > not > >>> > needing the code, not the instantiated widget classes, to (I suspect) > >>> throw > >>> > them away in the new React UI certainly seems like a silly idea now. > >>> > > > >>> > > In your POC I don’t think you have got the ability to have the > extra > >>> > fields that, for instance, Google Cloud connection has yet though. > >>> > > > >>> > > As for the schema we need to express: I’d say we should look at > what > >>> the > >>> > react UI currently supports? > >>> > > > >>> > > -ash > >>> > > > >>> > >> On 15 Jan 2026, at 14:07, Amogh Desai<[email protected]> > wrote: > >>> > >> > >>> > >> Hi All, > >>> > >> > >>> > >> I wanted to get feedback on something I have been twiddling with. > >>> For > >>> > >> context, the API server has to import > >>> > >> every single hook class from all providers just to render > connection > >>> > forms > >>> > >> in the UI. This is because the UI > >>> > >> metadata (what fields to show, labels, validators, etc.) are > living > >>> in > >>> > >> python functions like `get_connection_form_widgets()` > >>> > >> and `get_ui_field_behaviour()` which are defined on the hook > >>> classes. > >>> > >> > >>> > >> This means: > >>> > >> - API server startup imports 100+ hook classes it might not > actually > >>> > need > >>> > >> - Slower startup due to heavier memory footprint > >>> > >> - Poor client-server separation (why does the API server need to > >>> know > >>> > about > >>> > >> pyodbc just to show a UI form?) > >>> > >> > >>> > >> My proposal > >>> > >> > >>> > >> Moving the UI metadata from python code to something static / > >>> > declarative > >>> > >> like yaml. I want to add this information > >>> > >> in the provider.yaml file that every provider already has. For > >>> example - > >>> > >> > >>> > >> class PostgresHook(BaseHook): > >>> > >> @classmethod > >>> > >> def get_ui_field_behaviour(cls) -> dict[str, Any]: > >>> > >> return { > >>> > >> "hidden_fields": [], > >>> > >> "relabeling": { > >>> > >> "schema": "Database", > >>> > >> }, > >>> > >> } > >>> > >> > >>> > >> Will become: > >>> > >> > >>> > >> connection-types: > >>> > >> - connection-type: postgres > >>> > >> hook-class-name: > >>> > airflow.providers.postgres.hooks.postgres.PostgresHook > >>> > >> > >>> > >> ui-field-behaviour: > >>> > >> hidden-fields: [] > >>> > >> relabeling: > >>> > >> schema: "Database" > >>> > >> > >>> > >> conn-fields: > >>> > >> sslmode: > >>> > >> type: string > >>> > >> label: SSL Mode > >>> > >> enum: ["disable", "prefer", "require"] > >>> > >> default: "prefer" > >>> > >> > >>> > >> timeout: > >>> > >> type: integer > >>> > >> label: Timeout > >>> > >> range: [1, 300] > >>> > >> default: 30 > >>> > >> > >>> > >> The schema will now consist of two new sections: > >>> > >> > >>> > >> 1. ui-field-behaviour > >>> > >> - Used to customize the standard connection fields (host, port, > >>> login, > >>> > etc.) > >>> > >> - hidden-fields: Hide some fields > >>> > >> - relabeling: Change labels for some fields (like schema -> > Database > >>> > above) > >>> > >> - placeholders: Show hints in the form (port 5432 for example) > >>> > >> > >>> > >> 2. conn-fields > >>> > >> - Can be used to define custom fields stored in Connection.extra > >>> > >> - You can define inline validators like enum, range, pattern, > >>> > min-length, > >>> > >> max-length > >>> > >> - Will support the standard wtforms string, integer, boolean, > number > >>> > types > >>> > >> > >>> > >> As for why this schema was chosen, check the comparison with > >>> > alternative in > >>> > >> the PR > >>> > >> desc:https://github.com/apache/airflow/pull/60410 > >>> > >> > >>> > >> > >>> > >> Current Status > >>> > >> > >>> > >> I have a POC in:https://github.com/apache/airflow/pull/60410 > where > >>> I > >>> > chose > >>> > >> two pilot providers of > >>> > >> varying difficulty: HTTP and SMTP (HTTP is easy, just a vanilla > >>> form but > >>> > >> SMTP has some hidden fields). > >>> > >> > >>> > >> > >>> > >> Benefits this will offer > >>> > >> > >>> > >> - Once complete, the API server won't import any hook classes for > UI > >>> > >> rendering leading to faster startup > >>> > >> - Provider dependencies don't affect API server > >>> > >> - YAML is easier to read/write than python functions for form > >>> metadata > >>> > >> > >>> > >> Would love feedback on: > >>> > >> 1. Schema design - does it cover your use cases? > >>> > >> 2. Any missing field types or validators? > >>> > >> > >>> > >> The goal is to get the pilot providers in so we can start > migrating > >>> > >> providers incrementally. Old way still > >>> > >> works, so no rush for everyone to migrate at once. > >>> > >> > >>> > >> Thoughts? > >>> > >> > >>> > >> Thanks & Regards, > >>> > >> Amogh Desai > >>> > > > >>> > > > --------------------------------------------------------------------- > >>> > > To unsubscribe, e-mail:[email protected] > >>> > > For additional commands, e-mail:[email protected] > >>> > > > >>> > >> >
