codeant-ai-for-open-source[bot] commented on code in PR #41120:
URL: https://github.com/apache/superset/pull/41120#discussion_r3422663849
##########
superset/models/helpers.py:
##########
@@ -1465,19 +1460,17 @@ def _raise_for_disallowed_sql(self, sql: str) -> None:
disallowed_functions
):
raise SupersetDisallowedSQLFunctionException(disallowed_functions)
- if disallowed_tables and
parsed_script.check_tables_present(disallowed_tables):
+ if disallowed_tables:
# Report only the tables actually found in the query, mirroring the
# canonical execution-time gate so the user-facing error doesn't
- # echo the operator's full denylist.
- present_tables = {
- table.table.lower()
- for statement in parsed_script.statements
- for table in statement.tables
- }
- found_tables = {
- table for table in disallowed_tables if table.lower() in
present_tables
- }
- raise SupersetDisallowedSQLTableException(found_tables or
disallowed_tables)
+ # echo the operator's full denylist. Honors schema-qualified
+ # denylist entries (e.g. ``information_schema.tables``) and
resolves
+ # unqualified references against the query schema.
+ found_tables = parsed_script.get_disallowed_tables(
+ disallowed_tables, self.schema
+ )
Review Comment:
**Suggestion:** This query-surface denylist enforcement passes `self.schema`
directly, but datasource schema can be `None`; in that case unqualified table
references are not matched to schema-qualified denylist entries. This creates
inconsistent enforcement versus SQL Lab paths that compute an effective schema,
and can allow blocked metadata tables through when only unqualified names are
used. Use an effective schema fallback (`self.schema` or database default)
before `get_disallowed_tables`. [security]
<details>
<summary><b>Severity Level:</b> Critical π¨</summary>
```mdx
- β Dataset queries can access denylisted metadata via unqualified names.
- β Denylist enforcement diverges between SQL Lab and Explore queries.
- β οΈ Weakens guarantees of `DISALLOWED_SQL_TABLES` in production.
```
</details>
<details>
<summary><b>Steps of Reproduction β
</b></summary>
```mdx
1. Configure `DISALLOWED_SQL_TABLES` for the relevant engine (e.g.
`"postgresql"`) to
include a schema-qualified metadata entry such as
`"information_schema.tables"` (same
config read in `superset/models/helpers.py:248-251` and
`superset/sql_lab.py:428-20`).
2. Use or create a `SqlaTable` dataset whose `schema` is `NULL`/`None` in
the database
(nullable column defined at `superset/connectors/sqla/models.py:1330-1338`);
this is
common for virtual datasets (`sql` field populated, `schema` unset).
3. Run a chart or Explore query against this dataset that references the
unqualified table
name `tables` (for example via a filter or metric that ultimately compiles
into `FROM
tables`). The query string is built in `SqlaTable.get_query_str_extended` at
`superset/models/helpers.py:133-160` and then executed via
`SqlaTable.query`, which calls
`self._raise_for_disallowed_sql(sql)` at
`superset/models/helpers.py:1483-29`.
4. `_raise_for_disallowed_sql` in `superset/models/helpers.py:243-260`
parses the SQL and
calls `parsed_script.get_disallowed_tables(disallowed_tables, self.schema)`
at lines
1458-1471 with `self.schema is None`, so `default_schema` is `None` inside
`SQLScript.get_disallowed_tables` (`superset/sql/parse.py:11-21`). As a
result,
unqualified `tables` is never resolved to `information_schema.tables`, the
denylist match
fails, `SupersetDisallowedSQLTableException` is not raised, and the dataset
query can read
denylisted metadata viewsβunlike the SQL Lab path, which correctly computes
`effective_schema = query.schema or database.get_default_schema(...)` before
calling
`get_disallowed_tables` (`superset/sql_lab.py:428-20`).
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=2344b70d8b4d449283519b910a625a90&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=2344b70d8b4d449283519b910a625a90&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent π€ </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1469:1471
**Comment:**
*Security: This query-surface denylist enforcement passes `self.schema`
directly, but datasource schema can be `None`; in that case unqualified table
references are not matched to schema-qualified denylist entries. This creates
inconsistent enforcement versus SQL Lab paths that compute an effective schema,
and can allow blocked metadata tables through when only unqualified names are
used. Use an effective schema fallback (`self.schema` or database default)
before `get_disallowed_tables`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=cd335219390a83c2b0197bce07b82c2d498e74fbdff389a72ea2c1882328329d&reaction=like'>π</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=cd335219390a83c2b0197bce07b82c2d498e74fbdff389a72ea2c1882328329d&reaction=dislike'>π</a>
##########
superset/models/helpers.py:
##########
@@ -1221,24 +1221,19 @@ def _process_sql_expression( # pylint:
disable=too-many-arguments
disallowed_functions
):
raise
SupersetDisallowedSQLFunctionException(disallowed_functions)
- if disallowed_tables and
parsed.check_tables_present(disallowed_tables):
+ if disallowed_tables:
# Report only the tables actually found in the expression,
# mirroring the canonical execution-time gate in
# `superset.sql_lab._validate_query` so the user-facing
- # error doesn't echo the operator's full denylist.
- present_tables = {
- table.table.lower()
- for statement in parsed.statements
- for table in statement.tables
- }
- found_tables = {
- table
- for table in disallowed_tables
- if table.lower() in present_tables
- }
- raise SupersetDisallowedSQLTableException(
- found_tables or disallowed_tables
+ # error doesn't echo the operator's full denylist. Honors
+ # schema-qualified denylist entries (e.g.
+ # ``information_schema.tables``) and resolves unqualified
+ # references against the query schema.
+ found_tables = parsed.get_disallowed_tables(
+ disallowed_tables, schema
)
Review Comment:
**Suggestion:** The denylist check for adhoc expressions uses `schema`
directly, but many call paths pass `self.schema`, which can be `None` for
virtual datasets. When schema is unset, unqualified references are not resolved
against the database default schema, so a schema-qualified denylist entry (for
example `information_schema.tables`) can be bypassed by an unqualified table
reference. Resolve an effective schema first (explicit schema or database
default) before calling `get_disallowed_tables`. [security]
<details>
<summary><b>Severity Level:</b> Critical π¨</summary>
```mdx
- β Adhoc metrics can query denylisted metadata views.
- β Inconsistent denylist vs SQL Lab `effective_schema` enforcement.
- β οΈ Virtual datasets with NULL schema bypass schema-qualified blocks.
```
</details>
<details>
<summary><b>Steps of Reproduction β
</b></summary>
```mdx
1. Ensure the denylist includes a schema-qualified metadata view, e.g. set
`DISALLOWED_SQL_TABLES['postgresql']` to contain
`"information_schema.tables"` (referenced
in `superset/sql_lab.py:428-20` and used by other denylist checks).
2. Create or use a `SqlaTable` dataset with `schema=None` (allowed by the
nullable
`schema` column at `superset/connectors/sqla/models.py:1330-1338`) and
configure the
backing Postgres database so the default schema/search_path at runtime is
`information_schema` (so unqualified `tables` resolves to
`information_schema.tables`).
3. In Explore, add an adhoc SQL metric on that dataset whose expression
references the
unqualified table name, e.g. `SELECT COUNT(*) FROM tables`; this flows
through
`SqlaTable.adhoc_metric_to_sqla` at
`superset/connectors/sqla/models.py:1640-51`, which
calls `self._process_select_expression(..., schema=self.schema, ...)` with
`schema=None`.
4. `_process_sql_expression` in `superset/models/helpers.py:1208-30`
constructs a
`SQLScript` and calls `parsed.get_disallowed_tables(disallowed_tables,
schema)` with
`schema=None` (lines 1220-1227/1232-1234), so `default_schema` is `None`
inside
`SQLStatement.get_disallowed_tables` (`superset/sql/parse.py:62-119`).
Because there is no
fallback schema, the unqualified `tables` reference is never matched to the
schema-qualified denylist entry `"information_schema.tables"`, and no
`SupersetDisallowedSQLTableException` is raised, allowing the denylisted
metadata to be
queried via adhoc expressions.
```
</details>
[](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c49659b777004f61b4b73734b21966fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
[](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c49659b777004f61b4b73734b21966fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent π€ </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1232:1234
**Comment:**
*Security: The denylist check for adhoc expressions uses `schema`
directly, but many call paths pass `self.schema`, which can be `None` for
virtual datasets. When schema is unset, unqualified references are not resolved
against the database default schema, so a schema-qualified denylist entry (for
example `information_schema.tables`) can be bypassed by an unqualified table
reference. Resolve an effective schema first (explicit schema or database
default) before calling `get_disallowed_tables`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=142b17ad0baa33e92323e81359d49181eb70e6ee20c842c4bf884f9a58ba5927&reaction=like'>π</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=142b17ad0baa33e92323e81359d49181eb70e6ee20c842c4bf884f9a58ba5927&reaction=dislike'>π</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]