dabla commented on code in PR #38715:
URL: https://github.com/apache/airflow/pull/38715#discussion_r1555654733
##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -531,8 +539,6 @@ def insert_rows(
target_fields=None,
commit_every=1000,
replace=False,
- *,
- executemany=False,
Review Comment:
> Yes. Now - looking at it again and seeing the comments - I think it's a
bit "too" aggressive. We have to make sure that DBApiHook is as
backwards-compatible as possible for as long as possible.
>
> While dapi spec explains that executemany **should** be implemented, we've
seen a number of implementations differences where the implementations behave
in bit unpredictable ways - the DBApi specification had not been strictly
followed - and it's not detailed enough - we had problems with it in Snowflake/
Databricks especially.
>
> The risk we are facing is that someone's DAGs using Databricks or
Snowflake or other implementations of DBAPI hook will suddenly start behaving
differentlty after that change (and there might be other providers outside of
community ones that are using DBAPIHook as well).
>
> So I think we should minimise the risk for "All DBAPIHook" users and keep
old defaults and behaviour by defailt (allowing to switch both executemany and
fast_executemany flags) deliberately. While adding `executemany` option in
#37246 was new feature, this one is changing default behaviour which is too
much for DBAPIHook.
>
> For Teradata and PyODBC providers we could introduce a breaking version
change and have both executemany and fast_executemany set to True, this is
fine, but it should not be a default change for DBApiHook, because it changes
the behaviour too much - potentially.
>
> Summarizing:
>
> * it's fine to commonalize the Teradata code into common hook with
`executemany` and `fast_executemany` flags set to True by default (but also we
can keep them configurable).
> * we should keep both executemany and fast_executemany as parameters
(default = False for both in DBApiHook to keep compatibility)
> * Add a comment to `fast_executemany` parameter that this is applicable
for some providers only (Teradata/Odbc) - this is the easiest way to implement
it without complicating DBApiHook.
>
> This will allow to keep the code backwards compatible, have common code
from Teradata and "faster" defaults for both Teradata and PyODBC, while also
provide escape hatch for the users who want to flip them back for whatever
reason.
I've re-added the original implementation of the insert_rows with takes into
account the executemany parameter. Then as you proposed I've overriden the
insert_rows method in OdbcHook and TerraDataHook so there the executemany
parameter is set to True by default.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]