dabla commented on code in PR #38715:
URL: https://github.com/apache/airflow/pull/38715#discussion_r1555654733


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -531,8 +539,6 @@ def insert_rows(
         target_fields=None,
         commit_every=1000,
         replace=False,
-        *,
-        executemany=False,

Review Comment:
   > Yes. Now - looking at it again and seeing the comments - I think it's a 
bit "too" aggressive. We have to make sure that DBApiHook is as 
backwards-compatible as possible for as long as possible.
   > 
   > While dapi spec explains that executemany **should** be implemented, we've 
seen a number of implementations differences where the implementations behave 
in bit unpredictable ways - the DBApi specification had not been strictly 
followed - and it's not detailed enough - we had problems with it in Snowflake/ 
Databricks especially.
   > 
   > The risk we are facing is that someone's DAGs using Databricks or 
Snowflake or other implementations of DBAPI hook will suddenly start behaving 
differentlty after that change (and there might be other providers outside of 
community ones that are using DBAPIHook as well).
   > 
   > So I think we should minimise the risk for "All DBAPIHook" users and keep 
old defaults and behaviour by defailt (allowing to switch both executemany and 
fast_executemany flags) deliberately. While adding `executemany` option in 
#37246 was new feature, this one is changing default behaviour which is too 
much for DBAPIHook.
   > 
   > For Teradata and PyODBC providers we could introduce a breaking version 
change and have both executemany and fast_executemany set to True, this is 
fine, but it should not be a default change for DBApiHook, because it changes 
the behaviour too much - potentially.
   > 
   > Summarizing:
   > 
   > * it's fine to commonalize the Teradata code into common hook with 
`executemany` and `fast_executemany` flags set to True by default (but also we 
can keep them configurable).
   > * we should keep both executemany and fast_executemany as parameters 
(default = False for  both in DBApiHook to keep compatibility)
   > * Add a comment to `fast_executemany` parameter that this is applicable 
for some providers only (Teradata/Odbc) - this is the easiest way to implement 
it without complicating DBApiHook.
   > 
   > This will allow to keep the code backwards compatible, have common code 
from Teradata and "faster" defaults for both Teradata and PyODBC, while also 
provide escape hatch for the users who want to flip them back for whatever 
reason.
   
   I've re-added the original implementation of the insert_rows with takes into 
account the executemany parameter.  Then as you proposed I've overriden the 
insert_rows method in OdbcHook and TerraDataHook so there the executemany 
parameter is set to True by default.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to