Re: [PR] fix(sql): stop sanitize_clause from rewriting user SQL semantics (#36113) [superset]

via GitHub Tue, 23 Jun 2026 19:51:24 -0700


codeant-ai-for-open-source[bot] commented on code in PR #41125:
URL: https://github.com/apache/superset/pull/41125#discussion_r3464230377



##########
tests/unit_tests/sql/parse_tests.py:
##########
@@ -3208,6 +3213,54 @@ def test_sanitize_clause(sql: str, expected: str | 
Exception, engine: str) -> No
             sanitize_clause(sql, engine)
 
 
[email protected](
+    "engine",
+    ["postgresql", "redshift", "cockroachdb", "netezza", "hana", "base", 
"mysql"],
+)
+def test_sanitize_clause_preserves_aggregation_semantics(engine: str) -> None:
+    """
+    Regression test for https://github.com/apache/superset/issues/36113.
+
+    `sanitize_clause` must not silently rewrite a user-authored expression. The
+    Postgres SQLGlot dialect (which several engines borrow) rewrites
+    ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL), n)`` at generation
+    time. On engines whose unqualified ``DECIMAL`` defaults to scale 0 (e.g.
+    Redshift, Netezza) the injected cast rounds the aggregate to an integer
+    *before* the explicit ``ROUND``, producing wrong results.
+
+    The clause must be returned unchanged regardless of the engine dialect.
+    """
+    clause = "ROUND(AVG(col), 4)"
+    sanitized = sanitize_clause(clause, engine)
+    assert "CAST" not in sanitized.upper(), (
+        f"sanitize_clause injected a cast for engine {engine!r}: {sanitized!r}"
+    )
+    assert sanitized == clause
+
+
[email protected](
+    "engine",
+    ["postgresql", "redshift", "cockroachdb", "netezza", "hana", "base", 
"mysql"],
+)
+def test_sanitize_clause_preserves_aggregation_semantics_with_comment(
+    engine: str,
+) -> None:
+    """
+    Regression test for https://github.com/apache/superset/issues/36113.
+
+    A clause that contains a comment takes the re-rendering branch of
+    ``sanitize_clause``. That branch must normalize comments using the *base*
+    dialect rather than the engine dialect, so it must not re-apply the 
Postgres
+    ``ROUND(AVG(x), n)`` -> ``ROUND(CAST(AVG(x) AS DECIMAL), n)`` rewrite that
+    truncates results on engines where ``DECIMAL`` defaults to scale 0.
+    """
+    clause = "ROUND(AVG(col), 4) /* precise_count_distinct=true */"
+    sanitized = sanitize_clause(clause, engine)
+    assert "CAST" not in sanitized.upper(), (
+        f"sanitize_clause injected a cast for engine {engine!r}: {sanitized!r}"

Review Comment:
   **Suggestion:** This regression test is too weak for the comment-handling 
path: it only checks that `CAST` is absent, so it will still pass if 
`sanitize_clause` drops or rewrites the comment/hint payload entirely. Add an 
assertion that the returned clause still preserves the user-authored 
expression/comment content (or at least that the expected comment survives), 
otherwise a real semantic regression in comment normalization can slip through 
undetected. [incomplete implementation]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Commented aggregation clauses may lose hints without test failure.
   - ⚠️ Future sanitize_clause regressions on comments remain partially 
unguarded.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Open `superset/sql/parse.py` and inspect `sanitize_clause` at lines 
1844–1847 where the
   function is defined to validate a SQL clause and return it unchanged, with a 
special
   re-rendering branch for clauses containing comments (lines 1834–1847 in the 
snippet).
   
   2. Note the comment-handling branch at `superset/sql/parse.py:1836–1846`, 
where clauses
   with comments are re-rendered via `_normalized_generator(..., 
comments=True)`; this branch
   is responsible for normalizing and preserving comments while avoiding 
engine-specific
   semantic rewrites.
   
   3. Open `tests/unit_tests/sql/parse_tests.py` and inspect
   `test_sanitize_clause_preserves_aggregation_semantics_with_comment` at lines 
3257–3260:
   the test builds `clause = "ROUND(AVG(col), 4) /* precise_count_distinct=true 
*/"` (line
   3257), calls `sanitize_clause(clause, engine)` (line 3258), and only asserts 
that `"CAST"`
   is absent from `sanitized.upper()` (lines 3259–3260), without checking that 
the comment or
   full clause text is preserved.
   
   4. Introduce a regression to the comment-handling path in `sanitize_clause` 
(for example,
   change the comment branch in `superset/sql/parse.py` to drop comments while 
returning the
   same `ROUND(AVG(col), 4)` expression), then run `pytest
   
tests/unit_tests/sql/parse_tests.py::test_sanitize_clause_preserves_aggregation_semantics_with_comment`;
   observe that the test still passes because it only enforces the absence of 
`CAST`,
   allowing a real comment-stripping or comment-rewriting bug on 
`ROUND(AVG(...))` clauses to
   go undetected.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7ebeaef9c2044e639057ace2ad6aa88d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7ebeaef9c2044e639057ace2ad6aa88d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/sql/parse_tests.py
   **Line:** 3258:3260
   **Comment:**
        *Incomplete Implementation: This regression test is too weak for the 
comment-handling path: it only checks that `CAST` is absent, so it will still 
pass if `sanitize_clause` drops or rewrites the comment/hint payload entirely. 
Add an assertion that the returned clause still preserves the user-authored 
expression/comment content (or at least that the expected comment survives), 
otherwise a real semantic regression in comment normalization can slip through 
undetected.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=29b4a7d633d9f6d121429aad0d96d1dc8aac07786c30726e443cd2fa55deb6cc&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=29b4a7d633d9f6d121429aad0d96d1dc8aac07786c30726e443cd2fa55deb6cc&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(sql): stop sanitize_clause from rewriting user SQL semantics (#36113) [superset]

Reply via email to