Re: [PR] fix(sqllab): require dataset match for raw query access [superset]

via GitHub Fri, 29 May 2026 10:14:12 -0700


codeant-ai-for-open-source[bot] commented on code in PR #40409:
URL: https://github.com/apache/superset/pull/40409#discussion_r3325869517



##########
superset/models/sql_lab.py:
##########
@@ -306,10 +306,15 @@ def raise_for_access(self) -> None:
         """
         Raise an exception if the user cannot access the resource.
 
+        Re-validation of a SQL Lab query uses the same strict scoping as the
+        initial execute path (``force_dataset_match=True``) so that fetching
+        results, exporting CSV, and streaming-exporting all enforce the same
+        per-table dataset-match requirement.
+
         :raises SupersetSecurityException: If the user cannot access the 
resource
         """
 
-        security_manager.raise_for_access(query=self)
+        security_manager.raise_for_access(query=self, force_dataset_match=True)

Review Comment:
   **Suggestion:** Re-validation now enforces strict dataset matching but does 
not pass any template parameters from the stored query. For Jinja SQL where 
table/catalog/schema is parameterized, access checks during 
results/CSV/streaming can evaluate a different table set than the original 
execute-time check (or fail parsing), causing incorrect denials or mismatched 
authorization. Pass parsed `template_params` from the query record into 
`raise_for_access` to keep re-validation aligned with execution. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ SQL Lab results re-check may deny valid Jinja queries.
   - ❌ CSV export re-check may deny valid Jinja queries.
   - ❌ Streaming export re-check may deny valid Jinja queries.
   - ⚠️ Authorization logic diverges between execute and re-validation.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. A user executes a SQL Lab query with Jinja template parameters via the 
SQL Lab UI,
   which ultimately constructs a `Query` model and runs
   `CanAccessQueryValidatorImpl.validate` at 
`superset/sqllab/validators.py:29-36`.
   
   2. During this initial execute preflight, 
`CanAccessQueryValidatorImpl.validate` calls
   `security_manager.raise_for_access(query=query, 
template_params=template_params,
   force_dataset_match=True)` (`superset/sqllab/validators.py:31-37`), so 
`raise_for_access`
   uses `template_params` when calling `database.get_default_schema_for_query` 
and
   `process_jinja_sql(query.sql, database, template_params)`
   (`superset/security/manager.py:98-103`), and strict dataset matching runs 
against the
   Jinja-rendered query actually executed.
   
   3. Later, the same user fetches results, exports CSV, or starts a streaming 
export, which
   calls `self._query.raise_for_access()` from 
`superset/commands/sql_lab/results.py:19-21`,
   `superset/commands/sql_lab/export.py:13-15`, or
   `superset/commands/sql_lab/streaming_export_command.py:11-13`.
   
   4. `Query.raise_for_access` in `superset/models/sql_lab.py:46-58` 
re-validates by calling
   `security_manager.raise_for_access(query=self, force_dataset_match=True)` 
(`line 317` in
   the PR hunk) without passing any `template_params`, so `raise_for_access` 
re-runs
   `get_default_schema_for_query` and `process_jinja_sql` with 
`template_params=None`. This
   causes the strict dataset-matching logic in 
`superset/security/manager.py:98-131` to
   authorize against a potentially different rendered SQL/table set than was 
used at execute
   time (or to treat templated tables as opaque/empty), leading to mismatched 
authorization
   decisions between execution and result/CSV/streaming re-validation for 
Jinja-templated SQL
   Lab queries.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=67eb671e0d2e4e389385c48985ceaa94&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=67eb671e0d2e4e389385c48985ceaa94&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/sql_lab.py
   **Line:** 317:317
   **Comment:**
        *Api Mismatch: Re-validation now enforces strict dataset matching but 
does not pass any template parameters from the stored query. For Jinja SQL 
where table/catalog/schema is parameterized, access checks during 
results/CSV/streaming can evaluate a different table set than the original 
execute-time check (or fail parsing), causing incorrect denials or mismatched 
authorization. Pass parsed `template_params` from the query record into 
`raise_for_access` to keep re-validation aligned with execution.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=73c5023a557c1c6a19d820c00010efb81aae854752330f7bb2e2dc2f0797506d&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=73c5023a557c1c6a19d820c00010efb81aae854752330f7bb2e2dc2f0797506d&reaction=dislike'>👎</a>



##########
superset/security/manager.py:
##########
@@ -2928,35 +2938,82 @@ def raise_for_access(  # noqa: C901
                     cast(Query, query),
                     template_params,
                 )
+                parse_result = process_jinja_sql(query.sql, database, 
template_params)
+                # Under strict scoping, refuse any statement the parser
+                # could not fully model (sqlglot exp.Command). Such
+                # statements may reference tables via dynamic SQL or
+                # vendor syntax that extract_tables_from_statement cannot
+                # see, so the dataset-match check below would be blind to
+                # them. Fail closed.
+                if (
+                    force_dataset_match
+                    and parse_result.script.has_unparseable_statement
+                ):
+                    raise SupersetSecurityException(
+                        SupersetError(
+                            
error_type=SupersetErrorType.QUERY_SECURITY_ACCESS_ERROR,
+                            message=_(
+                                "SQL Lab cannot authorise a statement that "
+                                "could not be fully parsed. Qualify tables "
+                                "explicitly and avoid dynamic SQL inside "
+                                "stored-procedure or vendor-specific calls."
+                            ),
+                            level=ErrorLevel.ERROR,
+                        )
+                    )

Review Comment:
   **Suggestion:** The strict-mode fail-closed check only rejects 
`has_unparseable_statement`, but non-sqlglot engines (like KQL) are explicitly 
skipped by that property and also return no extracted tables, so 
`force_dataset_match=True` can still allow access without any dataset match. 
Add an explicit strict denial path when table extraction is unsupported/opaque 
for the engine instead of relying only on `exp.Command` detection. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ SQL Lab KQL queries bypass strict per-dataset authorization.
   - ❌ KQL CSV export ignores force_dataset_match dataset constraints.
   - ❌ KQL streaming export ignores force_dataset_match dataset constraints.
   - ⚠️ Behavior contradicts documented fail-closed strict scoping.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a database in Superset using the Kusto KQL engine 
(`engine='kustokql'`), so
   that SQL parsing is handled by `KustoKQLStatement` via 
`SQLScript.special_engines` in
   `superset/sql/parse.py:1337-1356`, and execute a SQL Lab query against it, 
which triggers
   `CanAccessQueryValidatorImpl.validate` 
(`superset/sqllab/validators.py:29-36`).
   
   2. The validator calls `security_manager.raise_for_access(query=query,
   template_params=template_params, force_dataset_match=True)`
   (`superset/sqllab/validators.py:31-37`), which enters the database/query 
branch in
   `SupersetSecurityManager.raise_for_access` 
(`superset/security/manager.py:75-102`) and
   computes `parse_result = process_jinja_sql(query.sql, database, 
template_params)`
   (`superset/security/manager.py:102`).
   
   3. For a KQL database, `process_jinja_sql` builds a `SQLScript` with 
`engine='kustokql'`
   (`superset/sql/parse.py:1629-1633`), so its `statements` are 
`KustoKQLStatement`
   instances. `SQLScript.has_unparseable_statement` 
(`superset/sql/parse.py:1367-16`)
   explicitly skips non-`SQLStatement` entries (`if not isinstance(statement, 
SQLStatement):
   continue`), so `has_unparseable_statement` is always `False` for KQL 
scripts, while
   `KustoKQLStatement._extract_tables_from_statement` logs that "Kusto KQL 
doesn't support
   table extraction. This means that data access roles will not be enforced by 
Superset in
   the database." and returns an empty set (`superset/sql/parse.py:1176-20`), 
leaving
   `parse_result.tables` empty.
   
   4. Back in `raise_for_access`, the strict-mode guard `if 
(force_dataset_match and
   parse_result.script.has_unparseable_statement): raise 
SupersetSecurityException(...)`
   (`superset/security/manager.py:109-124`, lines 2948-2963 in the PR) does not 
fire for KQL
   because `has_unparseable_statement` is `False`, and the subsequent loop `for 
table_ in
   tables:` over `tables` (derived from `parse_result.tables`) never executes 
because
   `tables` is empty. As a result, `denied` remains empty and no 
`SupersetSecurityException`
   is raised, so SQL Lab query execute, results, CSV export, and streaming 
export all
   effectively bypass the intended `force_dataset_match=True` strict dataset 
matching for KQL
   and any other non-sqlglot engines that do not support table extraction.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=86eadd2bd7be4d4aa3cac9eef36276e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=86eadd2bd7be4d4aa3cac9eef36276e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/security/manager.py
   **Line:** 2948:2963
   **Comment:**
        *Security: The strict-mode fail-closed check only rejects 
`has_unparseable_statement`, but non-sqlglot engines (like KQL) are explicitly 
skipped by that property and also return no extracted tables, so 
`force_dataset_match=True` can still allow access without any dataset match. 
Add an explicit strict denial path when table extraction is unsupported/opaque 
for the engine instead of relying only on `exp.Command` detection.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=dad22b6acbdb6028bd5653a157a0c7ee3eb08dea1b60b3eb5e30c14416564d88&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=dad22b6acbdb6028bd5653a157a0c7ee3eb08dea1b60b3eb5e30c14416564d88&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix(sqllab): require dataset match for raw query access [superset]

Reply via email to