Yeah I think your proposal seems reasonable.  Airflow conn URI is not same
thing as sqlalchemy URI.  That worked for some simple circumstances but it
is definitely not true in general.  Hooks that need it should generally
implement it.

On Tue, Sep 19, 2023 at 10:51 AM Andrey Anshin <andrey.ans...@taragol.is>
wrote:

> Historically before `DbApiHook` was introduced some "DB-API 2" Hooks (I put
> in quotes because not all properly implements this API),
> implements this method in each hook separately, and some of them are not
> implemented at all.
>
> Everything changed since DbApiHook introduced and now 23 Community Hooks
> based on it. DbApiHook implements it by interesting way.
>
>     def get_uri(self) -> str:
>         """
>         Extract the URI from the connection.
>
>         :return: the extracted uri.
>         """
>         conn = self.get_connection(getattr(self, self.conn_name_attr))
>         conn.schema = self.__schema or conn.schema
>         return conn.get_uri()
>
>
> And it use by default in
>
>     def get_sqlalchemy_engine(self, engine_kwargs=None):
>         """
>         Get an sqlalchemy_engine object.
>
>         :param engine_kwargs: Kwargs used in
> :func:`~sqlalchemy.create_engine`.
>         :return: the created engine.
>         """
>         if engine_kwargs is None:
>             engine_kwargs = {}
>         return create_engine(self.get_uri(), **engine_kwargs)
>
> IMO there are two bad assumptions:
> 1. "You could transform the Airflow Connection URI to the SQLAlchemy URI".
> Many of the providers also have additional extra parameters which are not
> supported by SQLalchemy.
> In additional every changes in Airflow Connection URI format might breaks
> result even if it worked before:
> https://github.com/apache/airflow/issues/34456
>
> 2. "All DB implements SQLALchemy Dialect (
> https://docs.sqlalchemy.org/en/20/dialects/) and we have packages for it"
>
> I've split Hooks by different groups
>
> Group 1 (7 Hooks): In the first glance works fine but better if someone
> also could have a look
>
> - SnowflakeHook from `snowflake` provider
> - BigQueryHook from `google` provider
> - SpannerHook from `google` provider
> - RedshiftSQLHook from `amazon` provider
> - OdbcHook from `odbc` provider
> - DrillHook from `apache.drill` provider
> - DruidDbApiHook from `apache.druid` provider
>
>
> Group 2 (10 Hooks): Do not overwrite DbApiHook.get_uri implementation
>
> - MySqlHook from `mysql` provider. In additional this hook provide support
> for two different drivers:
>   + mysqlclient: https://pypi.org/project/mysqlclient/
>   + mysql-connector-python:
> https://pypi.org/project/mysql-connector-python/
> - OracleHook from `oracle` provider
> - JdbcHook from `jdbc` provider
> - TrinoHook from `trino` provider
> - HiveServer2Hook from `apache.hive` provider
> - ImpalaHook from `apache.impala` provider
> - DatabricksSqlHook from `databricks` provider
> - ExasolHook from `exasol` provider
> - PrestoHook from `presto` provider
> - VerticaHook fromvertica` provider
>
>
> Group 3 (3 Hooks): Uses Connection.get_uri for construct SQLalchemy URI
>
> - PostgresHook from `postgres` provider
> - MsSqlHook from `microsoft.mssql` provider
> - SqliteHook from `sqlite` provider
>
>
> Group 4 (2 Hooks): Implementation which would not work with
> `get_sqlalchemy_engine` method
>
> - PinotDbApiHook from `apache.pinot` provider, seems like it generate URI
> for broker
> - ElasticsearchSQLHook from `elasticsearch` provider, `elasticsearch` (
> https://github.com/elastic/elasticsearch-py) do not provide
> DB-API2/SQLAlchemy interface, `elasticsearch-dbapi` (
> https://github.com/preset-io/elasticsearch-dbapi/) was removed in
> apache-airflow-providers-elasticsearch>=5.0
>
>
> Group 5 (1 Hook): Hooks from suspended providers
>
> - QuboleCheckHook
>
> I'm not sure what we can do in the current situation.
> My proposal:
> 1. Add UserWarning in DbApiHook.get_uri and maybe in
> DbApiHook.get_sqlalchemy_engine which inform that this methods might not
> work correctly without re-implementation
> 2. Implements `get_uri` without Connection.get_uri for Group 3
> 3. Raise NotImplemented error in `get_sqlalchemy_engine` for Group 4
>
>
> My hope for collective wisdom/mind and help in solving this puzzle by
> proposing other ideas.
>
> ----
> Best Wishes
> *Andrey Anshin*
>

Reply via email to