codeant-ai-for-open-source[bot] commented on code in PR #41125:
URL: https://github.com/apache/superset/pull/41125#discussion_r3425027593


##########
superset/sql/parse.py:
##########
@@ -1686,15 +1686,32 @@ def process_jinja_sql(
 
 def sanitize_clause(clause: str, engine: str) -> str:
     """
-    Make sure the SQL clause is valid.
+    Validate a SQL clause and return it unchanged.
+
+    The clause is parsed to ensure it is a single, well-formed statement. We
+    intentionally return the *original* text rather than a re-rendered version:
+    round-tripping user SQL through SQLGlot's dialect generator can silently
+    alter semantics. For example, the Postgres dialect (borrowed by several
+    engines) rewrites ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL),
+    n)``, which rounds the value to an integer before the explicit ``ROUND`` on
+    engines whose unqualified ``DECIMAL`` defaults to scale 0 (see #36113).
+
+    Comments are the one exception: a trailing line comment can comment out
+    surrounding SQL once the clause is embedded into a larger query (e.g.
+    wrapped in parentheses), so any clause that contains comments is 
re-rendered
+    to normalize them into a safe form.
     """
     try:
         statement = SQLStatement(clause, engine)
-        dialect = SQLGLOT_DIALECTS.get(engine)
+        parsed = statement._parsed  # pylint: disable=protected-access
+        if not any(node.comments for node in parsed.walk()):
+            return clause

Review Comment:
   **Suggestion:** Returning the raw `clause` for comment-free inputs now 
preserves trailing statement terminators (for example `col = 1;`). 
`SQLStatement` accepts that as a single parsed statement, so this branch passes 
validation, but downstream callers embed the clause inside larger SQL fragments 
(`WHERE (...)` / `HAVING (...)`), where the preserved `;` can produce invalid 
SQL at runtime. Normalize or reject trailing semicolons before returning the 
original text. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ Charts with SQL adhoc filters may fail execution.
   - ⚠️ Semantic layer extras where/having with semicolons break.
   - ⚠️ Explore form_data where clauses propagate invalid SQL.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Create an Explore chart using a SQL adhoc filter with `expressionType == 
"SQL"` and
   `clause == "WHERE"` whose predicate ends with a semicolon, for example 
`region = 'NA';`.
   This filter payload is stored in `form_data["adhoc_filters"]` and passed into
   `split_adhoc_filters_into_base_filters` in `superset/utils/core.py` (lines 
1395-1403).
   
   2. Inside `split_adhoc_filters_into_base_filters` (`superset/utils/core.py`, 
lines
   1415-1423), the function enters the `expression_type == "SQL"` branch, reads 
the user SQL
   from `adhoc_filter["sqlExpression"]`, and calls `sql_expression =
   sanitize_clause(sql_expression, engine)` at line 34. This invokes 
`sanitize_clause` in
   `superset/sql/parse.py` (lines 1687-1699) with the clause `region = 'NA';`.
   
   3. In `sanitize_clause` (`superset/sql/parse.py`, lines 45-55), a 
`SQLStatement` is
   constructed at line 46, which parses the clause using SQLGlot via
   `SQLStatement._parse_statement` and `_parse` (see `superset/sql/parse.py`, 
lines 245-76).
   For a syntactically valid single statement like `region = 'NA';` with no 
comments,
   `parsed.walk()` yields nodes with `node.comments` empty. The condition at 
line 1707 (`if
   not any(node.comments for node in parsed.walk()):`) becomes true, so 
`sanitize_clause`
   returns the raw `clause` value, preserving the trailing semicolon (`"region 
= 'NA';"`).
   
   4. Control returns to `split_adhoc_filters_into_base_filters`, which appends 
the sanitized
   clause (still `"region = 'NA';"`) to `sql_where_filters` and finally builds
   `form_data["where"]` via `form_data["where"] = " AND ".join([f"({sql})" for 
sql in
   sql_where_filters])` (`superset/utils/core.py`, lines 39-40). With a single 
filter this
   produces `form_data["where"] == "(region = 'NA';)"`. The query-building 
pipeline
   (`build_query_dicts_from_form_data` and `merge_form_data_filters_into_query` 
in
   `superset/mcp_service/chart/chart_helpers.py`, lines 136-215 and 190-69) 
copies this
   string into each query dict as `query["where"]`. When the SQL is rendered 
for execution
   (as evidenced by tests like `test_merge_form_data_filters_into_query` in
   `tests/unit_tests/mcp_service/chart/test_chart_helpers.py`, lines 248-259), 
the backend
   emits a WHERE clause containing `(region = 'NA';)` inside the statement, 
which is invalid
   SQL on most engines, causing runtime query failures for charts using such 
filters.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=2ad9377e8fea4bb689d747a09ac871ad&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=2ad9377e8fea4bb689d747a09ac871ad&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 1707:1708
   **Comment:**
        *Logic Error: Returning the raw `clause` for comment-free inputs now 
preserves trailing statement terminators (for example `col = 1;`). 
`SQLStatement` accepts that as a single parsed statement, so this branch passes 
validation, but downstream callers embed the clause inside larger SQL fragments 
(`WHERE (...)` / `HAVING (...)`), where the preserved `;` can produce invalid 
SQL at runtime. Normalize or reject trailing semicolons before returning the 
original text.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=1b008080f8031a8de9e9d253553452ed6e82798256ca703f3f2d228c31a11848&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=1b008080f8031a8de9e9d253553452ed6e82798256ca703f3f2d228c31a11848&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to