rusackas commented on code in PR #41125:
URL: https://github.com/apache/superset/pull/41125#discussion_r3464125095


##########
superset/sql/parse.py:
##########
@@ -1840,16 +1840,39 @@ def process_jinja_sql(
 
 def sanitize_clause(clause: str, engine: str) -> str:
     """
-    Make sure the SQL clause is valid.
+    Validate a SQL clause and return it unchanged.
+
+    The clause is parsed to ensure it is a single, well-formed statement. We
+    intentionally return the *original* text rather than a re-rendered version:
+    round-tripping user SQL through SQLGlot's dialect generator can silently
+    alter semantics. For example, the Postgres dialect (borrowed by several
+    engines) rewrites ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL),
+    n)``, which rounds the value to an integer before the explicit ``ROUND`` on
+    engines whose unqualified ``DECIMAL`` defaults to scale 0 (see #36113).
+
+    Comments are the one exception: a trailing line comment can comment out
+    surrounding SQL once the clause is embedded into a larger query (e.g.
+    wrapped in parentheses), so any clause that contains comments is 
re-rendered
+    to normalize them into a safe form. That re-rendering uses the *base* 
dialect
+    rather than the engine dialect, so it normalizes comments without 
re-applying
+    the engine-specific rewrites (e.g. the Postgres ``ROUND``/``CAST`` rewrite
+    from #36113) that we deliberately avoid above. A trailing statement
+    terminator is likewise stripped, since callers embed the clause inside a
+    larger fragment (``WHERE (...)``) where a stray ``;`` would produce invalid
+    SQL.
     """
     try:
         statement = SQLStatement(clause, engine)
+        parsed = statement._parsed  # pylint: disable=protected-access
+        if not any(node.comments for node in parsed.walk()):
+            return clause.rstrip().rstrip(";").rstrip()
+

Review Comment:
   Trading a bit of cache-key fragmentation for not silently corrupting user 
SQL is the whole point of this PR (#36113). Two clauses that differ only in 
whitespace/casing missing a shared cache entry is harmless; round-tripping 
through the generator to canonicalize is exactly what was changing semantics. 
Leaving the verbatim path as-is.



##########
superset/sql/parse.py:
##########
@@ -1840,16 +1840,39 @@ def process_jinja_sql(
 
 def sanitize_clause(clause: str, engine: str) -> str:
     """
-    Make sure the SQL clause is valid.
+    Validate a SQL clause and return it unchanged.
+
+    The clause is parsed to ensure it is a single, well-formed statement. We
+    intentionally return the *original* text rather than a re-rendered version:
+    round-tripping user SQL through SQLGlot's dialect generator can silently
+    alter semantics. For example, the Postgres dialect (borrowed by several
+    engines) rewrites ``ROUND(AVG(x), n)`` to ``ROUND(CAST(AVG(x) AS DECIMAL),
+    n)``, which rounds the value to an integer before the explicit ``ROUND`` on
+    engines whose unqualified ``DECIMAL`` defaults to scale 0 (see #36113).
+
+    Comments are the one exception: a trailing line comment can comment out
+    surrounding SQL once the clause is embedded into a larger query (e.g.
+    wrapped in parentheses), so any clause that contains comments is 
re-rendered
+    to normalize them into a safe form. That re-rendering uses the *base* 
dialect
+    rather than the engine dialect, so it normalizes comments without 
re-applying
+    the engine-specific rewrites (e.g. the Postgres ``ROUND``/``CAST`` rewrite
+    from #36113) that we deliberately avoid above. A trailing statement
+    terminator is likewise stripped, since callers embed the clause inside a
+    larger fragment (``WHERE (...)``) where a stray ``;`` would produce invalid
+    SQL.
     """
     try:
         statement = SQLStatement(clause, engine)
+        parsed = statement._parsed  # pylint: disable=protected-access
+        if not any(node.comments for node in parsed.walk()):
+            return clause.rstrip().rstrip(";").rstrip()
+
         return _normalized_generator(
-            SQLGLOT_DIALECTS.get(engine),
+            None,
             pretty=False,
             comments=True,
         ).generate(
-            statement._parsed,  # pylint: disable=protected-access
+            parsed,
             copy=True,
         )

Review Comment:
   The base dialect on the comment path is deliberate, see the thread with 
@sadpandajoe above. Re-rendering in the engine dialect is what reintroduces the 
Postgres `ROUND`/`CAST` rewrite from #36113. Base SQLGlot keeps standard syntax 
intact and only the comment-bearing path hits it, so the verbatim path still 
preserves engine-specific SQL untouched.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to