eladkal opened a new issue, #23112:
URL: https://github.com/apache/airflow/issues/23112

   ### Body
   
   **TL;DR:**
   Multiple providers need to handle the logic of splitting SQL statements
   
[Snowflake.run()](https://github.com/apache/airflow/blob/501a3c3fbefbcc0d6071a00eb101110fc4733e08/airflow/providers/snowflake/hooks/snowflake.py#L287-L347)
 , [RedShift.run()](https://github.com/apache/airflow/pull/22391/files) and it 
will be very similar should be eventually create `PrestoOperator` or 
`TrinoOperator`.
   
   I want to discuss the idea of extracting the split statement logic into 
`DbApiHook`.
   e.g create a `DbApiHook.run_many()` that will split the statements and use 
`DbApiHook.run()` under the hood.
   
   On one hand it will reduce duplicate code and allow to develop operators for 
these providers more easily
   On the other hand if we will make it in `dbapihook` it means to bump minimum 
supported Airflow version for these providers (once they will actually use the 
new function)?
   
   ----------------------------
   **The long story:**
   Some DBs like MySQL, PostgreSql, MsSQL are able to run a batch of SQL 
statements:
   `SELECT 1; SELECT2;`
   
   So you can do:
   `MySQLOperator(.., sql='SELECT 1; SELECT2;')`
   it will invoke :
   
https://github.com/apache/airflow/blob/501a3c3fbefbcc0d6071a00eb101110fc4733e08/airflow/hooks/dbapi.py#L163
   
   which will execute the sql in a single query. (The run function will convert 
this sql to `[SELECT 1; SELECT2;]`  - Note this is a list with **1 cell**)
   
   
   But in other DBs like: Snowflake, Redshift, Trino, Presto
   
   You can not run `SELECT 1; SELECT2;` as is. You must split the statement.
   This makes the `run()` irrelevant for these DBs.
   
   
   For example in Snowflake when users pass `SELECT 1; SELECT2;` what we 
actually do is create:
   `['SELECT 1;', 'SELECT2;']` - Note this is a list with **2 cells**
   
https://github.com/apache/airflow/blob/501a3c3fbefbcc0d6071a00eb101110fc4733e08/airflow/providers/snowflake/hooks/snowflake.py#L314-L316
   
   
   **The problem?**
   These providers (Snowflake, Redshift, Trino, Presto) **are forced to 
override dbapi.run()** but they all actually need the exact same functionality.
   
   If you will take a look at 
[Snowflake.run()](https://github.com/apache/airflow/blob/501a3c3fbefbcc0d6071a00eb101110fc4733e08/airflow/providers/snowflake/hooks/snowflake.py#L287-L347)
 you will see that it's almost identical to 
[RedShift.run()](https://github.com/apache/airflow/pull/22391/files) and it 
will be very similar should be eventually create `PrestoOperator` or 
`TrinoOperator`.
   
   
   This is something already brought up earlier in 
https://github.com/apache/airflow/pull/15533#discussion_r620481685
   
   
   **So why not just set the sql param to accept list only?**
   Because we want to support using .sql file so we must handle the split of 
the statements.
   ```
   SomeSqlOperator(
   ...,
   sql=my_queries.sql
   )
   ```
   We are reading the content of the file and loading it as a single statement.
   
   
   --------------------------
   
   **What do others think?**
   Should we create `DbApiHook.run_many()` or leave it as it is today where 
every provider needs to handle the logic on it's own?
   
   
   
   
   
   
   ### Committer
   
   - [X] I acknowledge that I am a maintainer/committer of the Apache Airflow 
project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to