denimalpaca opened a new issue, #25163:
URL: https://github.com/apache/airflow/issues/25163

   ### Apache Airflow Provider(s)
   
   common-sql
   
   ### Versions of Apache Airflow Providers
   
   `apache-airflow-providers-common-sql==1.0.0`
   
   ### Apache Airflow version
   
   2.3.3 (latest released)
   
   ### Operating System
   
   macOS Monterey 12.3.1
   
   ### Deployment
   
   Astronomer
   
   ### Deployment details
   
   _No response_
   
   ### What happened
   
   - `SQLTableCheckOperator` builds multiple checks in such a way that if two 
or more checks are given, and one is not a fully aggregated statement, then the 
SQL fails as it is missing a `GROUP BY` clause.
   - `SQLColumnCheckOperator` provides only the last SQL query built from the 
columns, so when a check fails, it will only give the correct SQL in the 
exception statement by coincidence.
   
   ### What you think should happen instead
   
   - Multiple checks should not need a `GROUP BY` clause
   - Either the correct SQL statement, or no SQL statement, should be returned 
in the exception message.
   
   ### How to reproduce
   
   For the `SQLTableCheckOperator`, using the operator like so:
   ```
   table_cheforestfire_costs_table_checkscks = SQLTableCheckOperator(
               task_id="forestfire_costs_table_checks",
               table=SNOWFLAKE_FORESTFIRE_COST_TABLE,
               checks={
                   "row_count_check": {"check_statement": "COUNT(*) = 9"},
                   "total_cost_check": {"check_statement": "land_damage_cost + 
property_damage_cost + lost_profits_cost = total_cost"}
               }
           )
   ```
   
   For the `SQLColumnCheckOperator`, using the operator like so:
   ```
   cost_column_checks = SQLColumnCheckOperator(
       task_id="cost_column_checks",
       table=SNOWFLAKE_COST_TABLE,
       column_mapping={
           "ID": {"null_check": {"equal_to": 0}},
           "LAND_DAMAGE_COST": {"min": {"geq_to": 0}},
           "PROPERTY_DAMAGE_COST": {"min": {"geq_to": 0}},
           "LOST_PROFITS_COST": {"min": {"geq_to": 0}},
       }
   )
   ```
   and ensuring that any of the `ID`, `LAND_DAMAGE_COST`, or 
`PROPERTY_DAMAGE_COST` checks fail.
   
   An example DAG with the correct environment and data can be found 
[here](https://github.com/astronomer/airflow-data-quality-demo/blob/main/dags/snowflake_examples/complex_snowflake_transform.py).
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to