codeant-ai-for-open-source[bot] commented on code in PR #40409:
URL: https://github.com/apache/superset/pull/40409#discussion_r3325869517
##########
superset/models/sql_lab.py:
##########
@@ -306,10 +306,15 @@ def raise_for_access(self) -> None:
"""
Raise an exception if the user cannot access the resource.
+ Re-validation of a SQL Lab query uses the same strict scoping as the
+ initial execute path (``force_dataset_match=True``) so that fetching
+ results, exporting CSV, and streaming-exporting all enforce the same
+ per-table dataset-match requirement.
+
:raises SupersetSecurityException: If the user cannot access the
resource
"""
- security_manager.raise_for_access(query=self)
+ security_manager.raise_for_access(query=self, force_dataset_match=True)
Review Comment:
**Suggestion:** Re-validation now enforces strict dataset matching but does
not pass any template parameters from the stored query. For Jinja SQL where
table/catalog/schema is parameterized, access checks during
results/CSV/streaming can evaluate a different table set than the original
execute-time check (or fail parsing), causing incorrect denials or mismatched
authorization. Pass parsed `template_params` from the query record into
`raise_for_access` to keep re-validation aligned with execution. [api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ SQL Lab results re-check may deny valid Jinja queries.
- ❌ CSV export re-check may deny valid Jinja queries.
- ❌ Streaming export re-check may deny valid Jinja queries.
- ⚠️ Authorization logic diverges between execute and re-validation.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. A user executes a SQL Lab query with Jinja template parameters via the
SQL Lab UI,
which ultimately constructs a `Query` model and runs
`CanAccessQueryValidatorImpl.validate` at
`superset/sqllab/validators.py:29-36`.
2. During this initial execute preflight,
`CanAccessQueryValidatorImpl.validate` calls
`security_manager.raise_for_access(query=query,
template_params=template_params,
force_dataset_match=True)` (`superset/sqllab/validators.py:31-37`), so
`raise_for_access`
uses `template_params` when calling `database.get_default_schema_for_query`
and
`process_jinja_sql(query.sql, database, template_params)`
(`superset/security/manager.py:98-103`), and strict dataset matching runs
against the
Jinja-rendered query actually executed.
3. Later, the same user fetches results, exports CSV, or starts a streaming
export, which
calls `self._query.raise_for_access()` from
`superset/commands/sql_lab/results.py:19-21`,
`superset/commands/sql_lab/export.py:13-15`, or
`superset/commands/sql_lab/streaming_export_command.py:11-13`.
4. `Query.raise_for_access` in `superset/models/sql_lab.py:46-58`
re-validates by calling
`security_manager.raise_for_access(query=self, force_dataset_match=True)`
(`line 317` in
the PR hunk) without passing any `template_params`, so `raise_for_access`
re-runs
`get_default_schema_for_query` and `process_jinja_sql` with
`template_params=None`. This
causes the strict dataset-matching logic in
`superset/security/manager.py:98-131` to
authorize against a potentially different rendered SQL/table set than was
used at execute
time (or to treat templated tables as opaque/empty), leading to mismatched
authorization
decisions between execution and result/CSV/streaming re-validation for
Jinja-templated SQL
Lab queries.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=67eb671e0d2e4e389385c48985ceaa94&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=67eb671e0d2e4e389385c48985ceaa94&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/sql_lab.py
**Line:** 317:317
**Comment:**
*Api Mismatch: Re-validation now enforces strict dataset matching but
does not pass any template parameters from the stored query. For Jinja SQL
where table/catalog/schema is parameterized, access checks during
results/CSV/streaming can evaluate a different table set than the original
execute-time check (or fail parsing), causing incorrect denials or mismatched
authorization. Pass parsed `template_params` from the query record into
`raise_for_access` to keep re-validation aligned with execution.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=73c5023a557c1c6a19d820c00010efb81aae854752330f7bb2e2dc2f0797506d&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=73c5023a557c1c6a19d820c00010efb81aae854752330f7bb2e2dc2f0797506d&reaction=dislike'>👎</a>
##########
superset/security/manager.py:
##########
@@ -2928,35 +2938,82 @@ def raise_for_access( # noqa: C901
cast(Query, query),
template_params,
)
+ parse_result = process_jinja_sql(query.sql, database,
template_params)
+ # Under strict scoping, refuse any statement the parser
+ # could not fully model (sqlglot exp.Command). Such
+ # statements may reference tables via dynamic SQL or
+ # vendor syntax that extract_tables_from_statement cannot
+ # see, so the dataset-match check below would be blind to
+ # them. Fail closed.
+ if (
+ force_dataset_match
+ and parse_result.script.has_unparseable_statement
+ ):
+ raise SupersetSecurityException(
+ SupersetError(
+
error_type=SupersetErrorType.QUERY_SECURITY_ACCESS_ERROR,
+ message=_(
+ "SQL Lab cannot authorise a statement that "
+ "could not be fully parsed. Qualify tables "
+ "explicitly and avoid dynamic SQL inside "
+ "stored-procedure or vendor-specific calls."
+ ),
+ level=ErrorLevel.ERROR,
+ )
+ )
Review Comment:
**Suggestion:** The strict-mode fail-closed check only rejects
`has_unparseable_statement`, but non-sqlglot engines (like KQL) are explicitly
skipped by that property and also return no extracted tables, so
`force_dataset_match=True` can still allow access without any dataset match.
Add an explicit strict denial path when table extraction is unsupported/opaque
for the engine instead of relying only on `exp.Command` detection. [security]
<details>
<summary><b>Severity Level:</b> Critical 🚨</summary>
```mdx
- ❌ SQL Lab KQL queries bypass strict per-dataset authorization.
- ❌ KQL CSV export ignores force_dataset_match dataset constraints.
- ❌ KQL streaming export ignores force_dataset_match dataset constraints.
- ⚠️ Behavior contradicts documented fail-closed strict scoping.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a database in Superset using the Kusto KQL engine
(`engine='kustokql'`), so
that SQL parsing is handled by `KustoKQLStatement` via
`SQLScript.special_engines` in
`superset/sql/parse.py:1337-1356`, and execute a SQL Lab query against it,
which triggers
`CanAccessQueryValidatorImpl.validate`
(`superset/sqllab/validators.py:29-36`).
2. The validator calls `security_manager.raise_for_access(query=query,
template_params=template_params, force_dataset_match=True)`
(`superset/sqllab/validators.py:31-37`), which enters the database/query
branch in
`SupersetSecurityManager.raise_for_access`
(`superset/security/manager.py:75-102`) and
computes `parse_result = process_jinja_sql(query.sql, database,
template_params)`
(`superset/security/manager.py:102`).
3. For a KQL database, `process_jinja_sql` builds a `SQLScript` with
`engine='kustokql'`
(`superset/sql/parse.py:1629-1633`), so its `statements` are
`KustoKQLStatement`
instances. `SQLScript.has_unparseable_statement`
(`superset/sql/parse.py:1367-16`)
explicitly skips non-`SQLStatement` entries (`if not isinstance(statement,
SQLStatement):
continue`), so `has_unparseable_statement` is always `False` for KQL
scripts, while
`KustoKQLStatement._extract_tables_from_statement` logs that "Kusto KQL
doesn't support
table extraction. This means that data access roles will not be enforced by
Superset in
the database." and returns an empty set (`superset/sql/parse.py:1176-20`),
leaving
`parse_result.tables` empty.
4. Back in `raise_for_access`, the strict-mode guard `if
(force_dataset_match and
parse_result.script.has_unparseable_statement): raise
SupersetSecurityException(...)`
(`superset/security/manager.py:109-124`, lines 2948-2963 in the PR) does not
fire for KQL
because `has_unparseable_statement` is `False`, and the subsequent loop `for
table_ in
tables:` over `tables` (derived from `parse_result.tables`) never executes
because
`tables` is empty. As a result, `denied` remains empty and no
`SupersetSecurityException`
is raised, so SQL Lab query execute, results, CSV export, and streaming
export all
effectively bypass the intended `force_dataset_match=True` strict dataset
matching for KQL
and any other non-sqlglot engines that do not support table extraction.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=86eadd2bd7be4d4aa3cac9eef36276e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=86eadd2bd7be4d4aa3cac9eef36276e3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/security/manager.py
**Line:** 2948:2963
**Comment:**
*Security: The strict-mode fail-closed check only rejects
`has_unparseable_statement`, but non-sqlglot engines (like KQL) are explicitly
skipped by that property and also return no extracted tables, so
`force_dataset_match=True` can still allow access without any dataset match.
Add an explicit strict denial path when table extraction is unsupported/opaque
for the engine instead of relying only on `exp.Command` detection.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=dad22b6acbdb6028bd5653a157a0c7ee3eb08dea1b60b3eb5e30c14416564d88&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40409&comment_hash=dad22b6acbdb6028bd5653a157a0c7ee3eb08dea1b60b3eb5e30c14416564d88&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]