codeant-ai-for-open-source[bot] commented on code in PR #39340:
URL: https://github.com/apache/superset/pull/39340#discussion_r3433096555
##########
superset/sql/parse.py:
##########
@@ -137,6 +139,32 @@ class CTASMethod(enum.Enum):
VIEW = enum.auto()
+def _normalized_generator(
+ dialect_name: DialectType,
+ *,
+ pretty: bool,
+ comments: bool,
+) -> Generator:
+ """
+ Generator that preserves multi-argument DISTINCT expressions.
+
+ Build a sqlglot generator that preserves user-written multi-argument
+ DISTINCT expressions verbatim. Postgres, Presto, Trino, and DuckDB
+ set ``MULTI_ARG_DISTINCT = False`` to emulate the unsupported
+ ``COUNT(DISTINCT a, b)`` idiom via a ``CASE WHEN`` row-expression, which
+ silently corrupts user-defined aggregates that natively accept multiple
+ arguments. Superset's sanitize / format paths normalize user SQL — they
+ do not transpile — so the emulation is undesirable here.
+ """
+ base_cls = Dialect.get_or_raise(dialect_name).generator_class
Review Comment:
**Suggestion:** `Dialect.get_or_raise` will raise when `dialect_name` is
`None`, but several supported engine specs are intentionally not present in
`SQLGLOT_DIALECTS` (for example `databend`, `denodo`, `kylin`, etc.). This
makes both `sanitize_clause()` and `SQLStatement.format()` fail at runtime for
those engines instead of falling back to base formatting. Add a safe fallback
dialect (for example base dialect) before calling `get_or_raise`. [api mismatch]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ Explore filters on Databend/Denodo/Kylin crash during validation.
- ❌ SQL Lab queries on unmapped engines fail during formatting.
- ⚠️ Cache-key SQL normalization broken for unmapped SQLGLOT dialects.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a database using the Databend engine spec, whose `engine`
attribute is
`"databend"` (see `superset/db_engine_specs/databend.py:104-109` where
`DatabendEngineSpec.engine = "databend"`). This makes
`database.db_engine_spec.engine`
equal to `"databend"` throughout the code.
2. Create or open a chart on this Databend database and add a SQL filter in
the "WHERE" or
"HAVING" fields. During query execution,
`QueryContextProcessor.get_df_payload()` in
`superset/common/query_context_processor.py:3-10` calls
`query_obj.validate()`, which in
turn runs the loop shown in `superset/common/query_object.py:3-34` and
computes `engine =
database.db_engine_spec.engine` then calls `sanitized_clause =
sanitize_clause(clause,
engine)`.
3. Inside `sanitize_clause()` in `superset/sql/parse.py:1841-1854`,
`statement =
SQLStatement(clause, engine)` parses the clause using `SQLStatement._parse`
with `dialect
= SQLGLOT_DIALECTS.get(engine)`. For `engine="databend"`, the mapping
`SQLGLOT_DIALECTS`
in `superset/sql/parse.py:60-122` has `# "databend": ???` commented out and
no entry for
`"databend"`, so `SQLGLOT_DIALECTS.get("databend")` returns `None`.
4. Still in `sanitize_clause()`, the generator call `return
_normalized_generator(SQLGLOT_DIALECTS.get(engine), pretty=False,
comments=True).generate(statement._parsed, copy=True)` passes
`dialect_name=None` into
`_normalized_generator()`. In `_normalized_generator()` at
`superset/sql/parse.py:142-165`, `base_cls =
Dialect.get_or_raise(dialect_name).generator_class` attempts
`Dialect.get_or_raise(None)`,
which raises a runtime error because `None` is not a valid `DialectType`.
This exception
is not a `SupersetParseError`, so it is not caught by the `except
SupersetParseError` in
`sanitize_clause()`, causing an uncaught server error when validating
filters for Databend
(and similarly unmapped engines like `"denodo"` and `"kylin"`).
5. Separately, run any SQL Lab query against the same Databend database. In
SQL execution,
`_prepare_statement_blocks()` in
`superset/sql/execution/celery_task.py:130-18` calls
`parsed_script = SQLScript(rendered_query, engine=db_engine_spec.engine)`.
For
`engine="databend"`, `SQLScript.special_engines` in
`superset/sql/parse.py:1539-1544` does
not include `"databend"`, so `statement_class` defaults to `SQLStatement`
and each
statement is a `SQLStatement` instance.
6. In `SQLStatement.__init__` at `superset/sql/parse.py:701-709`,
`self._dialect =
SQLGLOT_DIALECTS.get(engine)` again sets `self._dialect = None` for
`"databend"`. Later,
when `_prepare_statement_blocks()` calls either `parsed_script.format(...)`
or
`statement.format(...)` (see `superset/sql/execution/celery_task.py:10-15`),
`SQLStatement.format()` at `superset/sql/parse.py:938-946` invokes
`_normalized_generator(self._dialect, pretty=True, comments=comments)`. With
`self._dialect is None`, `_normalized_generator()` again executes
`Dialect.get_or_raise(None)` and raises a runtime error, causing SQL Lab
query execution
for these engines to fail before any statements are sent to the database.
7. The same failure mode will affect any other engine with a defined
`db_engine_spec.engine` string but no corresponding entry in
`SQLGLOT_DIALECTS`, as
indicated by commented-out placeholders like `# "denodo": ???` and `#
"kylin": ???` in
`superset/sql/parse.py:70-91`. Once any of these engines is used with chart
filters or SQL
Lab, the missing fallback dialect leads to reproducible crashes in both
`sanitize_clause()` and `SQLStatement.format()`.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=cbf516bde7da4c1e808ede0fc1105396&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=cbf516bde7da4c1e808ede0fc1105396&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/parse.py
**Line:** 159:159
**Comment:**
*Api Mismatch: `Dialect.get_or_raise` will raise when `dialect_name` is
`None`, but several supported engine specs are intentionally not present in
`SQLGLOT_DIALECTS` (for example `databend`, `denodo`, `kylin`, etc.). This
makes both `sanitize_clause()` and `SQLStatement.format()` fail at runtime for
those engines instead of falling back to base formatting. Add a safe fallback
dialect (for example base dialect) before calling `get_or_raise`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39340&comment_hash=f8ac9b2e429c7bf183f2a71d948543209bc899f492f7bc16b007fc3a55ccd316&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F39340&comment_hash=f8ac9b2e429c7bf183f2a71d948543209bc899f492f7bc16b007fc3a55ccd316&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]