potiuk commented on code in PR #38715:
URL: https://github.com/apache/airflow/pull/38715#discussion_r1555556847


##########
airflow/providers/common/sql/hooks/sql.py:
##########
@@ -531,8 +539,6 @@ def insert_rows(
         target_fields=None,
         commit_every=1000,
         replace=False,
-        *,
-        executemany=False,

Review Comment:
   Yes. Now - looking at it again and seeing the comments - I think it's a bit 
"too" aggressive. We have to make sure that DBApiHook is as 
backwards-compatible as possible for as long as possible.
   
   While dapi spec explains that executemany **should** be implemented, we've 
seen a number of implementations differences where the implementations behave 
in bit unpredictable ways - the DBApi specification had not been strictly 
followed - and it's not detailed enough -  we had problems with it in 
Snowflake/ Databricks especially. 
   
   The risk we are facing is that someone's DAGs using Databricks or Snowflake 
or other implementations of DBAPI hook will suddenly start behaving 
differentlty after that change (and there might be other providers outside of 
community ones that are using DBAPIHook as well). 
   
   So I think we should minimise the risk for "All DBAPIHook" users and keep 
old defaults and behaviour by defailt (allowing to switch both executemany and 
fast_executemany flags) deliberately. While adding `executemany` option in 
https://github.com/apache/airflow/pull/37246 was new feature, this one is 
changing default behaviour which is too much for DBAPIHook.
   
   For Teradata and PyODBC providers we could introduce a breaking version 
change and have both executemany and fast_executemany set to True, this is 
fine, but it should not be a default change for DBApiHook, because it changes 
the behaviour too much - potentially.
   
   Summarizing:
   
   * it's fine to commonalize the Teradata code into common hook with 
`executemany` and `fast_executemany` flags set to True by default (but also we 
can keep them configurable).
   * we should keep both executemany and fast_executemany as parameters 
(default = False for  both in DBApiHook to keep compatibility)
   * Add a comment to `fast_executemany` parameter that this is applicable for 
some providers only (Teradata/Odbc) - this is the easiest way to implement it 
without complicating DBApiHook.
   
   This will allow to keep the code backwards compatible, have common code from 
Teradata and "faster" defaults for both Teradata and PyODBC, while also provide 
escape hatch for the users who want to flip them back for whatever reason.
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to