Hey guys! Thanks for all your feedback and suggestions. I see here couple of examples with postgresql, however I would like to remind that postgresql it is not a valid connection type which doesn't register into the providers manager, and we have an exclusive hack to turn this connection type into the valid Airflow Postgres Connection type, see: https://github.com/apache/airflow/blob/71b512d314e14b2e0904f2d3a8ca3b00aa21fdc7/airflow/models/connection.py#L185-L187 As far as I remember all attempts to extend this hack by additional types were rejected.
I also do not think that the idea of extending connection type by plus sign would succeed. It just increase complexibility of URI parser, required to support it until Airflow 3 and we also need to cover some cases like airflow+postgresql+pg8000://scott:tiger@localhost/test, just because we can't handle this cases. What makes me think this? Well our past experience shows that extending/fixing URI parsing breaks something else. The Issue #33442 shows an attempt to deal with `:` without encoding strings; it is required to add 100 lines of code just to avoid limitations of URI and our parser. If we would like to compare with SQLAlchemy it supports both URI and keyword arguments during creation Engines, and for some engines or parameters (e.g. pools) it is strictly required to set them through keyword arguments only. That is in contrast to Airflow Connections where all parameters are sent by a single blob of the data: retrive from DB, deser URI, deser JSON. If we would like to draw an analogy with browser connection, then SA URI is a POST request and Airflow Connection is a GET request. I still think different people have different expectations of the Airflow URI format. My expectation is that this is only a SerDe mechanism around Airflow Connection. If we have a look at the codebase of Airflow we found that this is not always true and we try to use it as SA URI. If someone would like to extend a brainstorm I would like to suggest reworking our Connections and making ConnectionsV2, in the future we will definitely find that this approach is wrong and create ConnectionsV3. If not interesting, you could stop at that point. What is the connection right now? It is just set of conn_id, conn_type, description, host, login, password, schema, port and plus extra for avoid limitations of other parameters Are there any connections using these fields? No. Are some connections expected to have a token/secret instead of password? Yes Are some connections expecting URI instead of host? Yes Is it confusing users? Yep, regularly in discussions/issues/slack Are connection types really matter? Only for DBApiHooks Are connections in the UI consistent with these fields? No Do we have a good mechanism for auto documentation? No Do we have a mechanism for validation parameters? Only in the UI and limited by extra fields. Do connections provide different auth methods which are mutually exclusive? Oh yeeeeah! Is there any way to slightly change the connection from the user side? No, you should create your own Hook which uses its own Operators. Let's introduce some old suggestions (only highlights) 1. Instead of predefined fields make a flexible schema, e.g. replace all existing fields except conn_id, description to the JSON object into the DB. 2. Flexible schema doesn't mean that it accept everything, so we need to validate it, e.g. JSONSchema 3. Decouple ConnectionsV2 from the Hooks, and replace "conn_type" by "implements". What does it mean? Hooks do not directly obtain raw connection, instead of that Airflow retrieve connection, validate with a schema, is it valid "implements" and create some pydantic/dataclass object and provide it into the Hook. What does this object also contain? Basically creation of the client or some similar. 4. Multiple different ConnectionsV2 might implement the same "implements" and be registered into the provider manager. 5. Auto Generate documentation from the pydantic objects (or similar) 6. Use schema for generate fields in the UI and validate inputs 7. Since we have multiple implementations, each new connection with the same "implements" would have multiple options, which could be selected by drop down list. I think it is more handy than extra fields with mutually exclusive fields. 8. Maybe even subtypes. Probably if you still read this you might have a question: "Ok, but how is it related to Airflow URI?" Let return to expectations, user want to use " https://example.org/foo/bar/spam/egg", let's give this opportunity to create own ConnectionV2 (and maybe contribute it back) which accept simple json {"uri": "some-awesome-uri"} And that all, want to provide your URI which is not related to Airflow URI? Implement new Connection, you do not need to rewrite all Hooks/Operators. Want to implement something insecure? Well implement it, deploy into the Airflow cluster. And disadvantages: - A lot of rework - A lot of backcompat code, e.g. Hooks should works with current Connection as well as ConnectionV2 until miracle happen - A lot of pain. What happens with Airflow Connection URI format in this case? It do not required, hard to implement due to flexible schema and should work only with current Connection but not with V2 That is just a raw idea and to be honest I think it is hardly possible to implement this one in Airflow 2.x ---- Best Wishes *Andrey Anshin* On Sun, 19 Nov 2023 at 02:25, utkarsh sharma <utkarshar...@gmail.com> wrote: > I do think the URI format for airflow connections also serves the purpose > of conveying the intent, similar to how we URI in a browser to connect to a > website. We use Ariflow Connection URI to connect to one of the services > supported by Airflow. Due to this reason, I lean toward Jarek's proposal of > adding "airflowconnection+" or just "airflow+" prefix to the schema, which > should also solve its misinterpretation as a common URI. > Also, I share the concerns regarding the breaking changes. > > So I would like to vote -1. > > Thanks, > Utkarsh Sharma > > > On Sun, Nov 19, 2023 at 3:21 AM Bolke de Bruin <bdbr...@gmail.com> wrote: > > > So -1 on deprecating. I think we should do the opposite and go all in. > The > > suggestion by Jarek makes sense in this case, as it remains and is fully > > standards compliant. The JSON representation isn't: it is our own crafted > > format. > > > > My 2 cents, > > > > Bolke > > > > On Sat, 18 Nov 2023 at 22:15, Jarek Potiuk <ja...@potiuk.com> wrote: > > > > > I agree with Ash that it's much easier to define a lot of simpler > > > connections this way. But it is also very confusing the way now how > you > > > can mix the "real" url with "airflow connection" URL. > > > And Daniel is very right about the magnitude of breaking change. > > > > > > But possibly there is a way to eat cake and have it too. > > > > > > Answering Andrey's thinking: 1) I propose to deprecate it (raise > warning) > > > and even remove it from documentation > > > But following Daniel's "pain": 2) possibly we should not even schedule > > it > > > for removal for Airflow 3 > > > And answering Ash's concern: 3) we should likely figure a non-confusing > > way > > > how to define connection in "airlfow way". > > > > > > We could be following SQLlAlchemy approach - and define connection with > > > "airflowconnection+" prefix in the schema > > > > > > AIRFLOW_CONN_MY_DB=airflowconnection+postgresql://user:pass@host > > > > > > I think this would remove all the confusion, keep an easy way of > defining > > > connection but also avoid "heavily" breaking change. > > > > > > J. > > > > > > > > > > > > On Sat, Nov 18, 2023 at 5:41 PM Daniel Standish > > > <daniel.stand...@astronomer.io.invalid> wrote: > > > > > > > The thing that makes *me* hesitant to deprecate is the sheer > magnitude > > of > > > > breaking it would bring (even though we're only talking about a > > > > hypothetical 3.0 release), balanced against the actual pain it > causes. > > > > I.e. it's confusing to use, and takes up space in docs (when, if > > removed, > > > > we could just show one good way of doing things), but, it doesn't > > create > > > > that much of a maintenance burden anymore, and has certainly become > > more > > > > reliable and easy to use over time (e.g. since we added get_uri). > > > > > > > > > > > > On Sat, Nov 18, 2023 at 3:30 AM Ash Berlin-Taylor <a...@apache.org> > > > wrote: > > > > > > > > > -1 — for the simple case the URI format is perfect: > > > > > > > > > > AIRFLOW_CONN_MY_DB=postgresql://user:pass@host > > > > > > > > > > We can push JSON format for more complex cases, and/or limit URI to > > > only > > > > > the simple cases as long as we don’t remove it entirely. > > > > > > > > > > -ash > > > > > > > > > > > On 17 Nov 2023, at 22:44, Andrey Anshin < > andrey.ans...@taragol.is> > > > > > wrote: > > > > > > > > > > > > Greetings! > > > > > > > > > > > > I want to propose to deprecate Airflow Connection URI > > representation > > > > and > > > > > > remove it in Airflow 3 in favor of the already existing > > replacement - > > > > > JSON > > > > > > representation. > > > > > > > > > > > > In the past URI representation helped to add one of the awesome > > > > features > > > > > - > > > > > > Alternative Secrets Backends: Environment Variables, Files or > > > Provider > > > > > > specific Backends. > > > > > > > > > > > > However I have a feeling that nowadays it have some sort of > > > limitations > > > > > due > > > > > > to the parser, see: > > > > > > - https://github.com/apache/airflow/issues/33442 > > > > > > > > > > > > Or miss interpretation that Airflow connection it is something > more > > > > than > > > > > > just representation of Airflow Connection and it is a the > > replacement > > > > to > > > > > > any other type of URIs such as: HTTP, Postgres URIs, SQLAlchemy > > URI, > > > > > Spark > > > > > > URI and etc > > > > > > > > > > > > And attempt to fix it > > > > > > - https://github.com/apache/airflow/issues/10913 > > > > > > - https://github.com/apache/airflow/pull/27796 > > > > > > - https://github.com/apache/airflow/pull/35712 > > > > > > > > > > > > And attempt to use it in DbApiHooks, for detail see: > > > > > > - > https://lists.apache.org/thread/8rhmz3qh30hvkondct4sfmgk4vd07mn5 > > > > > > > > > > > > In addition, complicated Connection in the URI format is not > human > > > > > > readable, requires additional decoding, and it is hard to create > it > > > > > > manually. > > > > > > > > > > > > The only one beneficial part of URI representation is ability to > > > create > > > > > it > > > > > > programatically > > > > > > > > > > > > > > > > > > > > > https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html#generating-a-connection-uri > > > > > > , there is no such of method for generate JSON, but I don't see > the > > > > > problem > > > > > > to create such of method in Connection object which return string > > > > > > representation of JSON connection. > > > > > > > > > > > > So my suggestion would be pretty simple: > > > > > > 1. Deprecate accept URI as parameter to the Connection object > > > > > > 2. Deprecate uri property > > > > > > 3. Replace get_uri by get_json > > > > > > 4. Replace all URI examples in Providers Connections by JSON > > > > alternatives > > > > > > > > > > > > > > > > > > WDYT? Maybe I miss something and we should keep Airflow > Connection > > as > > > > URI > > > > > > in the future Airflow's major versions and support both ways to > > > provide > > > > > > connections as JSON and URI. > > > > > > > > > > > > ---- > > > > > > Best Wishes > > > > > > *Andrey Anshin* > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- > > Bolke de Bruin > > bdbr...@gmail.com > > >