potiuk commented on issue #34419:
URL: https://github.com/apache/airflow/issues/34419#issuecomment-1776787025
Copying @uranusjr from #34308 which is the discusion that was converted
that issue (let's continue it here).
> RFC 3986 indeed does not allow underscore so the warning is legit. It
works for urllib.parse but there’s no guarantee it works with other things, or
will in the future. If anything I’d suggest start disallowing it entirely
instead. Why is allowing the underscore important? Why can’t we switch to say
google-cloud-platform instead?
> Thinking about this a bit more, maybe we should just start sliently
coercing underscores to dashes instead. Allowing (say) both
google-clould-platform and google_could_platform as connection types seems like
an unnecessary source of possible bugs that benefits mostly nobody.
I think it's not only about coertion, but mostly about clarification what
the `get_uri` method does. Coercion would be abreaking change and I think
potential uses of the `get_uri` should be looked at and decision made what to
do with it.
I believe the `get_uri` method intention was not really defined. We can
assume that URI implies RFC 3986, so we can also treat it as a bugfix - and
take the risk of breaking someone's workflow (but it requires SIGNIFICANT note
in release notes).
But we have to be very clear about which `get_uri` we are taking and what is
the relation with all `get_uri` methods.
Currently there is `get_uri` method in Connection:
```
def get_uri(self) -> str:
"""Return connection in URI format."""
```
There is also a `get_uri` method in Common-sql DBAPIHook:
```
def get_uri(self) -> str:
"""
Extract the URI from the connection.
:return: the extracted uri.
"""
conn = self.get_connection(getattr(self, self.conn_name_attr))
conn.schema = self.__schema or conn.schema
return conn.get_uri()
```
There are also 15 other implementation in a number of hooks that are
inheriting DBAPIHooks - with varying degrees of cusomization - but I believe
most of the implementation return the RFC3986 compliant urls.
A bit of the problem there is that semantically speaking - it's not clear
"what kind" of the URI should be returned here. For example there is a
different URI when you want to connect to a database with sqlalchemy, and
different drivers for the same database might also have different
schemes/parameters etc. So it is not entirely clear what kind of URL the
`get_uri` method should return. Also some databases define their own schemes
(odbc/jdbc has particularly interesting one as it might contain something that
looks like there are two schemes but there is one:
```
jdbc:mysql://localhost:3306/sonoo
```
In this casee `jdbc:mysql` is the URI scheme (not jdbc as you might imagine
- `:` is perfectly ok as part of the scheme as long as it is not followed by
"//`.) . It's still RFC3986 compliaant.
So IMHO - we should just review all those implementations, update the
description and clarify something along the lines of :
**The method returns an RFC3986 compliant URI for the database that can be
used by arbitrary chosen driver for the database as the URI that can be used by
the driver to connect to the databse. It's not an URI that will work for
`sqlalchemy` driver URI, just one of the native drivers.**
I think also we should review all the providers of ours and see if the
change will impact the reult of the method and enumerate them in the release
notes.
If we make a SIGNIFICANT note and mention that get_uri methods for listed
provider connections will behaving differently, I think this is quite a good
approach.
WDYT @uranusjr ? I am happy to review if someone does all that and if
others agree that this is a good direction to take.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]