Just some history ...  and a renaming suggestion...

When get_uri was added, generating the connection URI was a real pain, and
Airflow provided no way to generate one, and JSON repr was not yet
implemented.  By adding get_uri, a user could define the connection object
we needed in python, and airflow would give us the URI we needed to set in
our env vars.  This was the only purpose.

I don't think at the time I saw the potential for confusion with the
sqlalchemy URI.  In hindsight, get_uri may not be the best name.  Recently,
a method was added `to_json_dict` which gives the json repr.  Maybe we
should rename get_uri to be consistent and more indicative of the intention.

We could rename it `to_uri_str`.  Or, maybe better we could make the names
consistent and do `to_json_repr` and `to_uri_repr`.  Or something like
that?  @andrey WDYT?  Adding the _repr suffix may help indicate that this
is the URI (or json) representation of the airflow connection object.




On Sun, Nov 26, 2023 at 10:52 PM Tzu-ping Chung <t...@astronomer.io.invalid>
wrote:

> Some context: I’ve been looking into putting the Connection URI to more
> use, most importantly to consolidate it with OpenLineage’s URI format.
>
> Skipping the details, one of the conclusions I had is get_uri on
> Connection is not actually that useful, due to how different libraries
> (SQLAlchemy, Airflow Object Store, OpenLineage, etc.) have drastically
> different URI formats, and it is not practical to implement get_uri in
> Connection to be compatible with all of them. The correct layer to generate
> a connection URI is in hooks instead, and the format does not need to be
> visible to the user, but internal between the hook and whatever library it
> talks to, since the URI semantics are too specific to a library to be
> generally useful as a serde format.
>
> Therefore I think deprecating the URI Connection representation would not
> be harmful, except maybe the URI can be more useful in simplistic cases.
>
>
> > On 27 Nov 2023, at 12:21, Amogh Desai <amoghdesai....@gmail.com> wrote:
> >
> > A little late to the party here but convinced by what Daniel had to say.
> >
> > I am ok with the idea of adding this into the deprecation bucket in the
> > coming time.
> >
> > Implementing the get_uri in the hook seems quite reasonable to me
> > on the grounds that the Connection class' get_uri is meant to be general,
> > supporting the use case of serializing the Airflow connection on basis of
> > host, port, schema, password, type etc.
> >
> > Thanks & Regards,
> > Amogh Desai
> >
> > On Mon, Nov 20, 2023 at 9:56 PM Daniel Standish
> > <daniel.stand...@astronomer.io.invalid> wrote:
> >
> >>>
> >>> I still think different people have different expectations of the
> Airflow
> >>> URI format. My expectation is that this is only a SerDe mechanism
> around
> >>> Airflow Connection. If we have a look at the codebase of Airflow we
> found
> >>> that this is not always true and we try to use it as SA URI.
> >>
> >>
> >> Yes, this is in my view exactly the problem.  I believe we run into
> trouble
> >> only when we falsely think that we should be able to define an Airflow
> >> connection using, say, a jdbc connection string. It's not true in
> general.
> >>
> >> Observe that DbApiHook has a get_uri
> >> <
> >>
> https://github.com/apache/airflow/blob/064fc2b7751a44e37ccce97609cff7c496098e56/airflow/providers/common/sql/hooks/sql.py#L188-L196
> >>>
> >> method.
> >>
> >> This method is used to produce a connection string valid for the service
> >> you're connecting to, and importantly it may be different from the
> >> Connection.get_uri.
> >>
> >> The primary responsibility of Airflow connection URI is to serialize the
> >> airflow connection object which has type, login, pass, host, schema
> port,
> >> extra and that's it.  For simple cases it can serve both purposes but
> it's
> >> not true in general and folks need to understand that.
> >>
> >> I think part of the problem may simply be that `uri` is an argument of
> the
> >> Connection object. It means not the connection string but the airflow
> conn
> >> URI serialization.  Would probably be a good candidate for deprecation
> >> on the basis that it is confusing.
> >>
> >> Fundamentally, as a hook designer, if you need a conn string that is a
> URI,
> >> you must implement get_uri on the hook in the right way to be able to
> >> use the attributes available to you in Connection.  Connection.get_uri
> will
> >> handle creating the Airflow Conn URI serialization and that's a separate
> >> thing.
> >>
> >> The connection object has no clue how to generate a valid connection
> string
> >> for your service, but the hook can have this understanding.
> >>
> >> There are many examples in the codebase where users implement get_uri on
> >> the hook and it works just fine.  A hook author could write it so you
> can
> >> optionally dump the full connection string in the extra object:
> {"extra":
> >> {"uri": "blah"}}.
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@airflow.apache.org
> For additional commands, e-mail: dev-h...@airflow.apache.org
>
>

Reply via email to