[GitHub] [airflow] potiuk commented on issue #27976: `SQLColumnCheckOperator` failures after upgrading to `common-sql==1.3.0`

GitBox Tue, 29 Nov 2022 17:07:33 -0800


potiuk commented on issue #27976:
URL: https://github.com/apache/airflow/issues/27976#issuecomment-1331514882


   OK. @denimalpaca @kazanzhy. 
   
   I know why it is happening - it is Snowflake-only thing. And I am not sure 
yet how to fix it in a good way.
   
   The root cause of the problem is this line in Snowflake Hook's run() method:
   
   ```
    with closing(conn.cursor(DictCursor)) as cur: 
   ```
   
   The result of it is that after the #26944 change, get_records() and 
get_first() changed the result type to return dictionaries and not sequences 
(previously each of those methods had their own implementations and did not use 
run() method, so they used "standard" sensor).
   
   Simply speaking: SnowflakeHook's run() method is not standard - instead of 
sequence of sequences, it returns sequence of dicts. This is absoluately not 
standard behaviour for DBApI - this is simply creative aproach of snowflake 
python connector to return Dicts: 
https://docs.snowflake.com/en/user-guide/python-connector-api.html#cursor 
   
   I think the best fix is: 
   
   * as of common-sql 1.3.1 we don not need to use DictCursor any more. We keep 
".description" field in the hook and we can
    use it to convert the return value in Operator from Sequences to Dics (see 
below)
   * we standardise SnowflakeHook's run() method to return sequences by default 
(breaking change - but only for "deep" usages  for where hook is used by custom 
operators or taskflow cases)
   * we keep the SnowflakeOperator behaviour to return Dicts (when do_xcom_push 
= True) to avoid breaking changes at "task dependency" level (we can easily 
convert the output by combining description + results from the Hook into return 
value of the SnowflakeOperator. 
   
   This would be almost identical  change to what I've done for Databricks last 
week (only Databricks internals were different - no DictCursor and it returned 
Tuples rather than Dicts):
   
   
https://airflow.apache.org/docs/apache-airflow-providers-databricks/stable/index.html#breaking-changes
   
   > The DatabricksSqlHook is now conforming to the same semantics as all the 
other DBApiHook implementations and returns the same kind of response in its 
run method. Previously (pre 4.* versions of the provider, the Hook returned 
Tuple of (“cursor description”, “results”) which was not compatible with other 
DBApiHooks that return just “results”. After this change (and dependency on 
common.sql >= 1.3.1), The DatabricksSqlHook returns now “results” only. The 
description can be retrieved via last_description field of the hook after run 
method completes.
   
   That would require Snowflake 5.0.0 breaking change release, and basically 
making whole 4.0.* yanked. Or probably better - we could relase 4.0.2 with 
adding this breaking change and yanking 4.0.0, 4.0.1 as "wrong". Making 4.0.2 
the "real breaking" change we actually wanted (or started to want) to make in 
4.0* . 4.0.* is very fresh and has not been released yet in constraints.
   
   The end result will be that we will finally (I hope) standardise the 
semantics of DBApiHook.run(). So far, each of the hooks could change it (and 
some did as we see) - this change seems to finally got all the run() methods to 
behave in the same way.
   
   WDYT Everyone? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] potiuk commented on issue #27976: `SQLColumnCheckOperator` failures after upgrading to `common-sql==1.3.0`

Reply via email to