Re: [PR] fix(sql): preserve multi-arg DISTINCT in sanitize_clause and format [superset]

via GitHub Wed, 17 Jun 2026 21:43:37 -0700


codeant-ai-for-open-source[bot] commented on code in PR #39340:
URL: https://github.com/apache/superset/pull/39340#discussion_r3433096555



##########
superset/sql/parse.py:
##########
@@ -137,6 +139,32 @@ class CTASMethod(enum.Enum):
     VIEW = enum.auto()
 
 
+def _normalized_generator(
+    dialect_name: DialectType,
+    *,
+    pretty: bool,
+    comments: bool,
+) -> Generator:
+    """
+    Generator that preserves multi-argument DISTINCT expressions.
+
+    Build a sqlglot generator that preserves user-written multi-argument
+    DISTINCT expressions verbatim. Postgres, Presto, Trino, and DuckDB
+    set ``MULTI_ARG_DISTINCT = False`` to emulate the unsupported
+    ``COUNT(DISTINCT a, b)`` idiom via a ``CASE WHEN`` row-expression, which
+    silently corrupts user-defined aggregates that natively accept multiple
+    arguments. Superset's sanitize / format paths normalize user SQL — they
+    do not transpile — so the emulation is undesirable here.
+    """
+    base_cls = Dialect.get_or_raise(dialect_name).generator_class

Review Comment:
   **Suggestion:** `Dialect.get_or_raise` will raise when `dialect_name` is 
`None`, but several supported engine specs are intentionally not present in 
`SQLGLOT_DIALECTS` (for example `databend`, `denodo`, `kylin`, etc.). This 
makes both `sanitize_clause()` and `SQLStatement.format()` fail at runtime for 
those engines instead of falling back to base formatting. Add a safe fallback 
dialect (for example base dialect) before calling `get_or_raise`. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Explore filters on Databend/Denodo/Kylin crash during validation.
   - ❌ SQL Lab queries on unmapped engines fail during formatting.
   - ⚠️ Cache-key SQL normalization broken for unmapped SQLGLOT dialects.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a database using the Databend engine spec, whose `engine` 
attribute is
   `"databend"` (see `superset/db_engine_specs/databend.py:104-109` where
   `DatabendEngineSpec.engine = "databend"`). This makes 
`database.db_engine_spec.engine`
   equal to `"databend"` throughout the code.
   
   2. Create or open a chart on this Databend database and add a SQL filter in 
the "WHERE" or
   "HAVING" fields. During query execution, 
`QueryContextProcessor.get_df_payload()` in
   `superset/common/query_context_processor.py:3-10` calls 
`query_obj.validate()`, which in
   turn runs the loop shown in `superset/common/query_object.py:3-34` and 
computes `engine =
   database.db_engine_spec.engine` then calls `sanitized_clause = 
sanitize_clause(clause,
   engine)`.
   
   3. Inside `sanitize_clause()` in `superset/sql/parse.py:1841-1854`, 
`statement =
   SQLStatement(clause, engine)` parses the clause using `SQLStatement._parse` 
with `dialect
   = SQLGLOT_DIALECTS.get(engine)`. For `engine="databend"`, the mapping 
`SQLGLOT_DIALECTS`
   in `superset/sql/parse.py:60-122` has `# "databend": ???` commented out and 
no entry for
   `"databend"`, so `SQLGLOT_DIALECTS.get("databend")` returns `None`.
   
   4. Still in `sanitize_clause()`, the generator call `return
   _normalized_generator(SQLGLOT_DIALECTS.get(engine), pretty=False,
   comments=True).generate(statement._parsed, copy=True)` passes 
`dialect_name=None` into
   `_normalized_generator()`. In `_normalized_generator()` at
   `superset/sql/parse.py:142-165`, `base_cls =
   Dialect.get_or_raise(dialect_name).generator_class` attempts 
`Dialect.get_or_raise(None)`,
   which raises a runtime error because `None` is not a valid `DialectType`. 
This exception
   is not a `SupersetParseError`, so it is not caught by the `except 
SupersetParseError` in
   `sanitize_clause()`, causing an uncaught server error when validating 
filters for Databend
   (and similarly unmapped engines like `"denodo"` and `"kylin"`).
   
   5. Separately, run any SQL Lab query against the same Databend database. In 
SQL execution,
   `_prepare_statement_blocks()` in 
`superset/sql/execution/celery_task.py:130-18` calls
   `parsed_script = SQLScript(rendered_query, engine=db_engine_spec.engine)`. 
For
   `engine="databend"`, `SQLScript.special_engines` in 
`superset/sql/parse.py:1539-1544` does
   not include `"databend"`, so `statement_class` defaults to `SQLStatement` 
and each
   statement is a `SQLStatement` instance.
   
   6. In `SQLStatement.__init__` at `superset/sql/parse.py:701-709`, 
`self._dialect =
   SQLGLOT_DIALECTS.get(engine)` again sets `self._dialect = None` for 
`"databend"`. Later,
   when `_prepare_statement_blocks()` calls either `parsed_script.format(...)` 
or
   `statement.format(...)` (see `superset/sql/execution/celery_task.py:10-15`),
   `SQLStatement.format()` at `superset/sql/parse.py:938-946` invokes
   `_normalized_generator(self._dialect, pretty=True, comments=comments)`. With
   `self._dialect is None`, `_normalized_generator()` again executes
   `Dialect.get_or_raise(None)` and raises a runtime error, causing SQL Lab 
query execution
   for these engines to fail before any statements are sent to the database.
   
   7. The same failure mode will affect any other engine with a defined
   `db_engine_spec.engine` string but no corresponding entry in 
`SQLGLOT_DIALECTS`, as
   indicated by commented-out placeholders like `# "denodo": ???` and `# 
"kylin": ???` in
   `superset/sql/parse.py:70-91`. Once any of these engines is used with chart 
filters or SQL
   Lab, the missing fallback dialect leads to reproducible crashes in both
   `sanitize_clause()` and `SQLStatement.format()`.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=cbf516bde7da4c1e808ede0fc1105396&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=cbf516bde7da4c1e808ede0fc1105396&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 159:159
   **Comment:**
        *Api Mismatch: `Dialect.get_or_raise` will raise when `dialect_name` is 
`None`, but several supported engine specs are intentionally not present in 
`SQLGLOT_DIALECTS` (for example `databend`, `denodo`, `kylin`, etc.). This 
makes both `sanitize_clause()` and `SQLStatement.format()` fail at runtime for 
those engines instead of falling back to base formatting. Add a safe fallback 
dialect (for example base dialect) before calling `get_or_raise`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39340&comment_hash=f8ac9b2e429c7bf183f2a71d948543209bc899f492f7bc16b007fc3a55ccd316&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39340&comment_hash=f8ac9b2e429c7bf183f2a71d948543209bc899f492f7bc16b007fc3a55ccd316&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(sql): preserve multi-arg DISTINCT in sanitize_clause and format [superset]

Reply via email to