codeant-ai-for-open-source[bot] commented on code in PR #41125:
URL: https://github.com/apache/superset/pull/41125#discussion_r3464230377
##########
tests/unit_tests/sql/parse_tests.py:
##########
@@ -3208,6 +3213,54 @@ def test_sanitize_clause(sql: str, expected: str |
Exception, engine: str) -> No
sanitize_clause(sql, engine)
[email protected](
+ "engine",
+ ["postgresql", "redshift", "cockroachdb", "netezza", "hana", "base",
"mysql"],
+)
+def test_sanitize_clause_preserves_aggregation_semantics(engine: str) -> None:
+ """
+ Regression test for https://github.com/apache/superset/issues/36113.
+
+ `sanitize_clause` must not silently rewrite a user-authored expression. The
+ Postgres SQLGlot dialect (which several engines borrow) rewrites
+ ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL), n)`` at generation
+ time. On engines whose unqualified ``DECIMAL`` defaults to scale 0 (e.g.
+ Redshift, Netezza) the injected cast rounds the aggregate to an integer
+ *before* the explicit ``ROUND``, producing wrong results.
+
+ The clause must be returned unchanged regardless of the engine dialect.
+ """
+ clause = "ROUND(AVG(col), 4)"
+ sanitized = sanitize_clause(clause, engine)
+ assert "CAST" not in sanitized.upper(), (
+ f"sanitize_clause injected a cast for engine {engine!r}: {sanitized!r}"
+ )
+ assert sanitized == clause
+
+
[email protected](
+ "engine",
+ ["postgresql", "redshift", "cockroachdb", "netezza", "hana", "base",
"mysql"],
+)
+def test_sanitize_clause_preserves_aggregation_semantics_with_comment(
+ engine: str,
+) -> None:
+ """
+ Regression test for https://github.com/apache/superset/issues/36113.
+
+ A clause that contains a comment takes the re-rendering branch of
+ ``sanitize_clause``. That branch must normalize comments using the *base*
+ dialect rather than the engine dialect, so it must not re-apply the
Postgres
+ ``ROUND(AVG(x), n)`` -> ``ROUND(CAST(AVG(x) AS DECIMAL), n)`` rewrite that
+ truncates results on engines where ``DECIMAL`` defaults to scale 0.
+ """
+ clause = "ROUND(AVG(col), 4) /* precise_count_distinct=true */"
+ sanitized = sanitize_clause(clause, engine)
+ assert "CAST" not in sanitized.upper(), (
+ f"sanitize_clause injected a cast for engine {engine!r}: {sanitized!r}"
Review Comment:
**Suggestion:** This regression test is too weak for the comment-handling
path: it only checks that `CAST` is absent, so it will still pass if
`sanitize_clause` drops or rewrites the comment/hint payload entirely. Add an
assertion that the returned clause still preserves the user-authored
expression/comment content (or at least that the expected comment survives),
otherwise a real semantic regression in comment normalization can slip through
undetected. [incomplete implementation]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Commented aggregation clauses may lose hints without test failure.
- ⚠️ Future sanitize_clause regressions on comments remain partially
unguarded.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Open `superset/sql/parse.py` and inspect `sanitize_clause` at lines
1844–1847 where the
function is defined to validate a SQL clause and return it unchanged, with a
special
re-rendering branch for clauses containing comments (lines 1834–1847 in the
snippet).
2. Note the comment-handling branch at `superset/sql/parse.py:1836–1846`,
where clauses
with comments are re-rendered via `_normalized_generator(...,
comments=True)`; this branch
is responsible for normalizing and preserving comments while avoiding
engine-specific
semantic rewrites.
3. Open `tests/unit_tests/sql/parse_tests.py` and inspect
`test_sanitize_clause_preserves_aggregation_semantics_with_comment` at lines
3257–3260:
the test builds `clause = "ROUND(AVG(col), 4) /* precise_count_distinct=true
*/"` (line
3257), calls `sanitize_clause(clause, engine)` (line 3258), and only asserts
that `"CAST"`
is absent from `sanitized.upper()` (lines 3259–3260), without checking that
the comment or
full clause text is preserved.
4. Introduce a regression to the comment-handling path in `sanitize_clause`
(for example,
change the comment branch in `superset/sql/parse.py` to drop comments while
returning the
same `ROUND(AVG(col), 4)` expression), then run `pytest
tests/unit_tests/sql/parse_tests.py::test_sanitize_clause_preserves_aggregation_semantics_with_comment`;
observe that the test still passes because it only enforces the absence of
`CAST`,
allowing a real comment-stripping or comment-rewriting bug on
`ROUND(AVG(...))` clauses to
go undetected.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=7ebeaef9c2044e639057ace2ad6aa88d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=7ebeaef9c2044e639057ace2ad6aa88d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/sql/parse_tests.py
**Line:** 3258:3260
**Comment:**
*Incomplete Implementation: This regression test is too weak for the
comment-handling path: it only checks that `CAST` is absent, so it will still
pass if `sanitize_clause` drops or rewrites the comment/hint payload entirely.
Add an assertion that the returned clause still preserves the user-authored
expression/comment content (or at least that the expected comment survives),
otherwise a real semantic regression in comment normalization can slip through
undetected.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=29b4a7d633d9f6d121429aad0d96d1dc8aac07786c30726e443cd2fa55deb6cc&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41125&comment_hash=29b4a7d633d9f6d121429aad0d96d1dc8aac07786c30726e443cd2fa55deb6cc&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]