Re: [PR] fix(dataset): unify validation for stored and adhoc SQL expressions [superset]

via GitHub Thu, 28 May 2026 10:14:24 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40392:
URL: https://github.com/apache/superset/pull/40392#discussion_r3319513354



##########
superset/connectors/sqla/models.py:
##########
@@ -867,6 +870,57 @@ def values_for_column(self, column_name: str, limit: int = 
10000) -> list[Any]:
         raise NotImplementedError()
 
 
+def validate_stored_expression(
+    database: Database,
+    catalog: str | None,
+    schema: str | None,
+    expression: str | None,
+) -> None:
+    """
+    Apply the adhoc-expression validator to a stored column or metric 
expression.
+
+    Wrapping in a synthetic ``SELECT <expr>`` reuses the column-position parser
+    rules already enforced for adhoc expressions, so the same policy on
+    sub-queries, set operations, and multi-statement SQL applies to stored
+    expressions when they are saved.
+    """
+    if not expression:
+        return
+    engine = database.backend
+    wrapped = f"SELECT {expression}"
+
+    try:
+        parsed = SQLStatement(wrapped, engine)
+    except SupersetParseError as ex:
+        raise SupersetSecurityException(
+            SupersetError(
+                error_type=SupersetErrorType.ADHOC_SUBQUERY_NOT_ALLOWED_ERROR,
+                message=_(
+                    "Custom SQL fields cannot be parsed as a single SQL 
statement."
+                ),
+                level=ErrorLevel.ERROR,
+            )
+        ) from ex

Review Comment:
   **Suggestion:** This new validator parses stored expressions as raw SQL 
without processing Jinja template syntax first, so valid existing metric/column 
expressions like `{{ current_username() }}` or `{{ url_param(...) }}` will now 
fail dataset updates with a parse/security error. Preserve current behavior by 
handling templated expressions before SQL parsing (or safely replacing Jinja 
blocks before parsing) so supported stored Jinja expressions are not rejected. 
[api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ UpdateDatasetCommand rejects Jinja-based stored column expressions.
   - ❌ UpdateDatasetCommand rejects Jinja-based stored metric expressions.
   - ⚠️ Dataset edit API fails when Jinja expressions present.
   - ⚠️ Existing Jinja datasets become unsaveable when updating metadata.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Verify that stored dataset columns explicitly support Jinja expressions 
by inspecting
   `tests/integration_tests/sqla_models_tests.py:160-31`, where a `SqlaTable` 
is created and
   a `TableColumn` is added with `expression=\"case when '{{ current_username() 
}}' = 'abc'
   then 'yes' else 'no' end'\" and the query generated via 
`table.get_sqla_query()` executes
   correctly with Jinja templating applied.
   
   2. Use the existing unit-test pattern in
   `tests/unit_tests/commands/dataset/update_test.py:59-97`
   (`test_update_dataset_accepts_benign_expression`), but change the payload so 
that
   `payload['columns'][0]['expression']` is the same Jinja expression from step 
1 (e.g.
   `\"case when '{{ current_username() }}' = 'abc' then 'yes' else 'no' end\"`) 
instead of
   the plain `CASE` expression.
   
   3. Run `UpdateDatasetCommand(1, payload).validate()` as done in 
`update_test.py:59-97`;
   inside `UpdateDatasetCommand.validate` in 
`superset/commands/dataset/update.py:98-124`,
   `_validate_semantics` calls `_validate_expressions` (line ~292) which 
resolves `database`,
   `catalog`, and `schema` and then calls `validate_stored_expression(database, 
catalog,
   schema, expression)` for each item with a non-empty `expression` (see 
`update.py:55-61`
   where `validate_stored_expression` is invoked).
   
   4. In `validate_stored_expression` at 
`superset/connectors/sqla/models.py:873-83`, the
   function builds `wrapped = f"SELECT {expression}"` (line 890) and 
instantiates
   `SQLStatement(wrapped, engine)` (line 893) without running the expression 
through a Jinja
   `BaseTemplateProcessor`; `SQLStatement` in `superset/sql/parse.py:520-181` 
uses
   `sqlglot.parse` on the raw string and raises `SupersetParseError` when it 
encounters the
   Jinja tokens (`{{ current_username() }}`), which is caught and re-raised as
   `SupersetSecurityException` with message `"Custom SQL fields cannot be 
parsed as a single
   SQL statement."`. `_validate_expressions` catches this and appends a 
`ValidationError` on
   `columns.0.expression`, causing `UpdateDatasetCommand.validate()` (and thus 
dataset
   updates via this command) to fail for stored columns/metrics that currently 
and
   intentionally use Jinja expressions.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=069370558e1e4824941e4aab6379a270&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=069370558e1e4824941e4aab6379a270&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/connectors/sqla/models.py
   **Line:** 890:903
   **Comment:**
        *Api Mismatch: This new validator parses stored expressions as raw SQL 
without processing Jinja template syntax first, so valid existing metric/column 
expressions like `{{ current_username() }}` or `{{ url_param(...) }}` will now 
fail dataset updates with a parse/security error. Preserve current behavior by 
handling templated expressions before SQL parsing (or safely replacing Jinja 
blocks before parsing) so supported stored Jinja expressions are not rejected.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40392&comment_hash=7e6c248d2d97e5669cccfdbbdef88faf6dba82ebdaf519b10054f8b55fd5c56d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40392&comment_hash=7e6c248d2d97e5669cccfdbbdef88faf6dba82ebdaf519b10054f8b55fd5c56d&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(dataset): unify validation for stored and adhoc SQL expressions [superset]

Reply via email to