sadpandajoe commented on code in PR #41125:
URL: https://github.com/apache/superset/pull/41125#discussion_r3463338575


##########
superset/sql/parse.py:
##########
@@ -1840,16 +1840,35 @@ def process_jinja_sql(
 
 def sanitize_clause(clause: str, engine: str) -> str:
     """
-    Make sure the SQL clause is valid.
+    Validate a SQL clause and return it unchanged.
+
+    The clause is parsed to ensure it is a single, well-formed statement. We
+    intentionally return the *original* text rather than a re-rendered version:
+    round-tripping user SQL through SQLGlot's dialect generator can silently
+    alter semantics. For example, the Postgres dialect (borrowed by several
+    engines) rewrites ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL),
+    n)``, which rounds the value to an integer before the explicit ``ROUND`` on
+    engines whose unqualified ``DECIMAL`` defaults to scale 0 (see #36113).
+
+    Comments are the one exception: a trailing line comment can comment out
+    surrounding SQL once the clause is embedded into a larger query (e.g.
+    wrapped in parentheses), so any clause that contains comments is 
re-rendered
+    to normalize them into a safe form. A trailing statement terminator is
+    likewise stripped, since callers embed the clause inside a larger fragment
+    (``WHERE (...)``) where a stray ``;`` would produce invalid SQL.
     """
     try:
         statement = SQLStatement(clause, engine)
+        parsed = statement._parsed  # pylint: disable=protected-access
+        if not any(node.comments for node in parsed.walk()):
+            return clause.rstrip().rstrip(";").rstrip()

Review Comment:
   The comment path still re-introduces the bug this PR fixes. Comment-free 
clauses take this verbatim return and avoid the CAST injection, but any clause 
that contains a comment skips it and falls through to 
`_normalized_generator(SQLGLOT_DIALECTS.get(engine), ...).generate(...)` below, 
which re-applies the Postgres rewrite.
   
   Repro: metric `ROUND(AVG(col), 4) /* precise_count_distinct=true */` on a 
Redshift (or Netezza / CockroachDB / HANA) connection. The generator produces 
`ROUND(CAST(AVG(col) AS DECIMAL), 4) /* ... */`, DECIMAL defaults to scale 0, 
and the chart returns `1` instead of `0.9496`. Same #36113 truncation, just 
gated behind the presence of a comment. Block comments do show up in real 
metric SQL (the `test_sanitize_clause` parametrize already includes `TRUE /* 
precise_count_distinct=true */`).
   
   Could the comment path normalize through the base dialect instead of the 
engine dialect, e.g. `_normalized_generator(None, pretty=False, 
comments=True)`? That keeps the line-comment safety without the 
Postgres-specific CAST rewrite.



##########
tests/unit_tests/sql/parse_tests.py:
##########
@@ -3208,6 +3213,31 @@ def test_sanitize_clause(sql: str, expected: str | 
Exception, engine: str) -> No
             sanitize_clause(sql, engine)
 
 
[email protected](
+    "engine",
+    ["postgresql", "redshift", "cockroachdb", "netezza", "hana", "base", 
"mysql"],
+)
+def test_sanitize_clause_preserves_aggregation_semantics(engine: str) -> None:
+    """
+    Regression test for https://github.com/apache/superset/issues/36113.
+
+    `sanitize_clause` must not silently rewrite a user-authored expression. The
+    Postgres SQLGlot dialect (which several engines borrow) rewrites
+    ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL), n)`` at generation
+    time. On engines whose unqualified ``DECIMAL`` defaults to scale 0 (e.g.
+    Redshift, Netezza) the injected cast rounds the aggregate to an integer
+    *before* the explicit ``ROUND``, producing wrong results.
+
+    The clause must be returned unchanged regardless of the engine dialect.
+    """
+    clause = "ROUND(AVG(col), 4)"
+    sanitized = sanitize_clause(clause, engine)
+    assert "CAST" not in sanitized.upper(), (
+        f"sanitize_clause injected a cast for engine {engine!r}: {sanitized!r}"
+    )
+    assert sanitized == clause

Review Comment:
   This regression test only exercises the comment-free clause, which takes the 
verbatim path, so it won't catch the comment-path gap. Could you add a 
parametrized variant with a comment so the fix is locked in:
   
   ```python
   clause = "ROUND(AVG(col), 4) /* precise_count_distinct=true */"
   assert "CAST" not in sanitize_clause(clause, engine).upper()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to