codeant-ai-for-open-source[bot] commented on code in PR #41120:
URL: https://github.com/apache/superset/pull/41120#discussion_r3422663849


##########
superset/models/helpers.py:
##########
@@ -1465,19 +1460,17 @@ def _raise_for_disallowed_sql(self, sql: str) -> None:
             disallowed_functions
         ):
             raise SupersetDisallowedSQLFunctionException(disallowed_functions)
-        if disallowed_tables and 
parsed_script.check_tables_present(disallowed_tables):
+        if disallowed_tables:
             # Report only the tables actually found in the query, mirroring the
             # canonical execution-time gate so the user-facing error doesn't
-            # echo the operator's full denylist.
-            present_tables = {
-                table.table.lower()
-                for statement in parsed_script.statements
-                for table in statement.tables
-            }
-            found_tables = {
-                table for table in disallowed_tables if table.lower() in 
present_tables
-            }
-            raise SupersetDisallowedSQLTableException(found_tables or 
disallowed_tables)
+            # echo the operator's full denylist. Honors schema-qualified
+            # denylist entries (e.g. ``information_schema.tables``) and 
resolves
+            # unqualified references against the query schema.
+            found_tables = parsed_script.get_disallowed_tables(
+                disallowed_tables, self.schema
+            )

Review Comment:
   **Suggestion:** This query-surface denylist enforcement passes `self.schema` 
directly, but datasource schema can be `None`; in that case unqualified table 
references are not matched to schema-qualified denylist entries. This creates 
inconsistent enforcement versus SQL Lab paths that compute an effective schema, 
and can allow blocked metadata tables through when only unqualified names are 
used. Use an effective schema fallback (`self.schema` or database default) 
before `get_disallowed_tables`. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Dataset queries can access denylisted metadata via unqualified names.
   - ❌ Denylist enforcement diverges between SQL Lab and Explore queries.
   - ⚠️ Weakens guarantees of `DISALLOWED_SQL_TABLES` in production.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction βœ… </b></summary>
   
   ```mdx
   1. Configure `DISALLOWED_SQL_TABLES` for the relevant engine (e.g. 
`"postgresql"`) to
   include a schema-qualified metadata entry such as 
`"information_schema.tables"` (same
   config read in `superset/models/helpers.py:248-251` and 
`superset/sql_lab.py:428-20`).
   
   2. Use or create a `SqlaTable` dataset whose `schema` is `NULL`/`None` in 
the database
   (nullable column defined at `superset/connectors/sqla/models.py:1330-1338`); 
this is
   common for virtual datasets (`sql` field populated, `schema` unset).
   
   3. Run a chart or Explore query against this dataset that references the 
unqualified table
   name `tables` (for example via a filter or metric that ultimately compiles 
into `FROM
   tables`). The query string is built in `SqlaTable.get_query_str_extended` at
   `superset/models/helpers.py:133-160` and then executed via 
`SqlaTable.query`, which calls
   `self._raise_for_disallowed_sql(sql)` at 
`superset/models/helpers.py:1483-29`.
   
   4. `_raise_for_disallowed_sql` in `superset/models/helpers.py:243-260` 
parses the SQL and
   calls `parsed_script.get_disallowed_tables(disallowed_tables, self.schema)` 
at lines
   1458-1471 with `self.schema is None`, so `default_schema` is `None` inside
   `SQLScript.get_disallowed_tables` (`superset/sql/parse.py:11-21`). As a 
result,
   unqualified `tables` is never resolved to `information_schema.tables`, the 
denylist match
   fails, `SupersetDisallowedSQLTableException` is not raised, and the dataset 
query can read
   denylisted metadata viewsβ€”unlike the SQL Lab path, which correctly computes
   `effective_schema = query.schema or database.get_default_schema(...)` before 
calling
   `get_disallowed_tables` (`superset/sql_lab.py:428-20`).
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=2344b70d8b4d449283519b910a625a90&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=2344b70d8b4d449283519b910a625a90&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent πŸ€– </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 1469:1471
   **Comment:**
        *Security: This query-surface denylist enforcement passes `self.schema` 
directly, but datasource schema can be `None`; in that case unqualified table 
references are not matched to schema-qualified denylist entries. This creates 
inconsistent enforcement versus SQL Lab paths that compute an effective schema, 
and can allow blocked metadata tables through when only unqualified names are 
used. Use an effective schema fallback (`self.schema` or database default) 
before `get_disallowed_tables`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=cd335219390a83c2b0197bce07b82c2d498e74fbdff389a72ea2c1882328329d&reaction=like'>πŸ‘</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=cd335219390a83c2b0197bce07b82c2d498e74fbdff389a72ea2c1882328329d&reaction=dislike'>πŸ‘Ž</a>



##########
superset/models/helpers.py:
##########
@@ -1221,24 +1221,19 @@ def _process_sql_expression(  # pylint: 
disable=too-many-arguments
                     disallowed_functions
                 ):
                     raise 
SupersetDisallowedSQLFunctionException(disallowed_functions)
-                if disallowed_tables and 
parsed.check_tables_present(disallowed_tables):
+                if disallowed_tables:
                     # Report only the tables actually found in the expression,
                     # mirroring the canonical execution-time gate in
                     # `superset.sql_lab._validate_query` so the user-facing
-                    # error doesn't echo the operator's full denylist.
-                    present_tables = {
-                        table.table.lower()
-                        for statement in parsed.statements
-                        for table in statement.tables
-                    }
-                    found_tables = {
-                        table
-                        for table in disallowed_tables
-                        if table.lower() in present_tables
-                    }
-                    raise SupersetDisallowedSQLTableException(
-                        found_tables or disallowed_tables
+                    # error doesn't echo the operator's full denylist. Honors
+                    # schema-qualified denylist entries (e.g.
+                    # ``information_schema.tables``) and resolves unqualified
+                    # references against the query schema.
+                    found_tables = parsed.get_disallowed_tables(
+                        disallowed_tables, schema
                     )

Review Comment:
   **Suggestion:** The denylist check for adhoc expressions uses `schema` 
directly, but many call paths pass `self.schema`, which can be `None` for 
virtual datasets. When schema is unset, unqualified references are not resolved 
against the database default schema, so a schema-qualified denylist entry (for 
example `information_schema.tables`) can be bypassed by an unqualified table 
reference. Resolve an effective schema first (explicit schema or database 
default) before calling `get_disallowed_tables`. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Adhoc metrics can query denylisted metadata views.
   - ❌ Inconsistent denylist vs SQL Lab `effective_schema` enforcement.
   - ⚠️ Virtual datasets with NULL schema bypass schema-qualified blocks.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction βœ… </b></summary>
   
   ```mdx
   1. Ensure the denylist includes a schema-qualified metadata view, e.g. set
   `DISALLOWED_SQL_TABLES['postgresql']` to contain 
`"information_schema.tables"` (referenced
   in `superset/sql_lab.py:428-20` and used by other denylist checks).
   
   2. Create or use a `SqlaTable` dataset with `schema=None` (allowed by the 
nullable
   `schema` column at `superset/connectors/sqla/models.py:1330-1338`) and 
configure the
   backing Postgres database so the default schema/search_path at runtime is
   `information_schema` (so unqualified `tables` resolves to 
`information_schema.tables`).
   
   3. In Explore, add an adhoc SQL metric on that dataset whose expression 
references the
   unqualified table name, e.g. `SELECT COUNT(*) FROM tables`; this flows 
through
   `SqlaTable.adhoc_metric_to_sqla` at 
`superset/connectors/sqla/models.py:1640-51`, which
   calls `self._process_select_expression(..., schema=self.schema, ...)` with 
`schema=None`.
   
   4. `_process_sql_expression` in `superset/models/helpers.py:1208-30` 
constructs a
   `SQLScript` and calls `parsed.get_disallowed_tables(disallowed_tables, 
schema)` with
   `schema=None` (lines 1220-1227/1232-1234), so `default_schema` is `None` 
inside
   `SQLStatement.get_disallowed_tables` (`superset/sql/parse.py:62-119`). 
Because there is no
   fallback schema, the unqualified `tables` reference is never matched to the
   schema-qualified denylist entry `"information_schema.tables"`, and no
   `SupersetDisallowedSQLTableException` is raised, allowing the denylisted 
metadata to be
   queried via adhoc expressions.
   ```
   </details>
   
   [![Fix in 
Cursor](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-cursor-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=c49659b777004f61b4b73734b21966fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 [![Fix in VSCode 
Claude](https://new-codeant-butcket.s3.us-west-1.amazonaws.com/badges/fix-in-vscode-claude-flat.svg)](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=c49659b777004f61b4b73734b21966fa&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent πŸ€– </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 1232:1234
   **Comment:**
        *Security: The denylist check for adhoc expressions uses `schema` 
directly, but many call paths pass `self.schema`, which can be `None` for 
virtual datasets. When schema is unset, unqualified references are not resolved 
against the database default schema, so a schema-qualified denylist entry (for 
example `information_schema.tables`) can be bypassed by an unqualified table 
reference. Resolve an effective schema first (explicit schema or database 
default) before calling `get_disallowed_tables`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=142b17ad0baa33e92323e81359d49181eb70e6ee20c842c4bf884f9a58ba5927&reaction=like'>πŸ‘</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F41120&comment_hash=142b17ad0baa33e92323e81359d49181eb70e6ee20c842c4bf884f9a58ba5927&reaction=dislike'>πŸ‘Ž</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to