codeant-ai-for-open-source[bot] commented on code in PR #40421:
URL: https://github.com/apache/superset/pull/40421#discussion_r3324712120


##########
superset/config.py:
##########
@@ -1775,6 +1775,19 @@ def engine_context_manager(  # pylint: 
disable=unused-argument
         "pg_read_file",
         "pg_ls_dir",
         "pg_read_binary_file",
+        # PostgreSQL large-object functions: writers can plant arbitrary
+        # bytes on the server filesystem (lo_export, lo_from_bytea, lowrite,
+        # lo_put, lo_create, lo_import) and readers can pull bytes back out
+        # (lo_get, loread). Defense-in-depth on top of is_mutating()'s
+        # function-name check.
+        "lo_from_bytea",
+        "lo_export",
+        "lo_import",
+        "lo_put",
+        "lo_create",
+        "lowrite",
+        "lo_get",
+        "loread",

Review Comment:
   **Suggestion:** The PostgreSQL disallowed function list is incomplete for 
large-object mutation paths: `lo_unlink` is not included, so environments 
relying on `DISALLOWED_SQL_FUNCTIONS` defense-in-depth can still execute 
destructive large-object operations via function calls. Add `lo_unlink` to the 
PostgreSQL set. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ SQL Lab allows lo_unlink despite function denylist config.
   - ⚠️ Executor-based function checks also miss lo_unlink.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the PostgreSQL disallowed-function configuration in 
`superset/config.py:20-65`,
   where `DISALLOWED_SQL_FUNCTIONS["postgresql"]` includes `lo_from_bytea`, 
`lo_export`,
   `lo_import`, `lo_put`, `lo_create`, `lowrite`, `lo_get`, and `loread` but 
does NOT include
   `lo_unlink`.
   
   2. Note how SQL Lab enforces this denylist in `superset/sql_lab.py:406-414`: 
it pulls
   `disallowed_functions = 
app.config["DISALLOWED_SQL_FUNCTIONS"].get(db_engine_spec.engine,
   set())` and calls 
`parsed_script.check_functions_present(disallowed_functions)`, raising
   `SupersetDisallowedSQLFunctionException(disallowed_functions)` only when a 
listed function
   appears.
   
   3. Observe the function detection implementation in 
`superset/sql/parse.py:345-388`
   (`SQLStatement.check_functions_present`), which walks the AST
   (`self._parsed.find_all(exp.Func)`) and collects SQL function names like 
`LO_UNLINK` into
   the `present` set when they appear in a query.
   
   4. Run a query like `SELECT lo_unlink(12345);` in SQL Lab against a 
PostgreSQL database
   whose engine name is `"postgresql"`: `check_functions_present` will see 
`LO_UNLINK` in the
   AST, but because `DISALLOWED_SQL_FUNCTIONS["postgresql"]` does not contain 
`"lo_unlink"`,
   the method returns `False`, no `SupersetDisallowedSQLFunctionException` is 
raised, and the
   destructive `lo_unlink` operation is allowed despite the intended 
defense-in-depth
   denylist.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=a60c0e2a145045ff87e7478166ba15b5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=a60c0e2a145045ff87e7478166ba15b5&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/config.py
   **Line:** 1778:1790
   **Comment:**
        *Security: The PostgreSQL disallowed function list is incomplete for 
large-object mutation paths: `lo_unlink` is not included, so environments 
relying on `DISALLOWED_SQL_FUNCTIONS` defense-in-depth can still execute 
destructive large-object operations via function calls. Add `lo_unlink` to the 
PostgreSQL set.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=39d7419dee77a2857df9e0fa077efeb2f924fbebbac7c31a13a7fd0c6f6fa1b7&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=39d7419dee77a2857df9e0fa077efeb2f924fbebbac7c31a13a7fd0c6f6fa1b7&reaction=dislike'>👎</a>



##########
superset/sql/parse.py:
##########
@@ -562,6 +562,62 @@ class SQLStatement(BaseSQLStatement[exp.Expression]):
     This class is used for all engines with dialects that can be parsed using 
sqlglot.
     """
 
+    # Function names that mutate server-side state but appear in the AST as
+    # plain function calls inside a non-mutating wrapper. Used by
+    # ``is_mutating()`` to classify e.g. PostgreSQL large-object writers.
+    # Names are uppercased for comparison.
+    _MUTATING_FUNCTION_NAMES: frozenset[str] = frozenset(
+        {
+            "LO_FROM_BYTEA",
+            "LO_EXPORT",
+            "LO_IMPORT",
+            "LO_PUT",
+            "LO_CREATE",
+            "LOWRITE",
+            # PostgreSQL sequence mutators. `SELECT setval('seq', N)` looks
+            # like a read but changes sequence state for every subsequent
+            # `nextval` caller.
+            "SETVAL",
+        }
+    )

Review Comment:
   **Suggestion:** The mutating-function denylist is incomplete: `lo_unlink` is 
a PostgreSQL large-object writer (it deletes large objects) but is missing from 
`_MUTATING_FUNCTION_NAMES`. A query like `SELECT lo_unlink(...)` will not be 
classified as mutating and can bypass the read-only `allow_dml=False` guard. 
Add `LO_UNLINK` to the mutating function set. [security]
   
   <details>
   <summary><b>Severity Level:</b> Critical 🚨</summary>
   
   ```mdx
   - ❌ Read-only databases can execute lo_unlink mutations.
   - ❌ DML guard in SQL Lab is bypassed for lo_unlink.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Inspect the mutating-function set in `superset/sql/parse.py:569-582`, 
where
   `SQLStatement._MUTATING_FUNCTION_NAMES` includes `LO_FROM_BYTEA`, 
`LO_EXPORT`,
   `LO_IMPORT`, `LO_PUT`, `LO_CREATE`, `LOWRITE`, and `SETVAL`, but omits 
`LO_UNLINK`, even
   though `lo_unlink` deletes large objects in PostgreSQL.
   
   2. Follow the mutation classification logic in `SQLStatement.is_mutating` at
   `superset/sql/parse.py:734-217`: after checking AST node types, it walks
   `self._parsed.find_all(exp.Func)` and returns `True` when `name in
   self._MUTATING_FUNCTION_NAMES`; because `LO_UNLINK` is not in that set, 
`SELECT
   lo_unlink(...)` is treated as non-mutating.
   
   3. See how script-level mutation is derived in `SQLScript.has_mutation` at
   `superset/sql/parse.py:49-55`, which returns `any(statement.is_mutating() 
for statement in
   self.statements)`, and how SQL Lab enforces read-only mode in
   `superset/sql_lab.py:429-431` via `if parsed_script.has_mutation() and not
   database.allow_dml: raise SupersetDMLNotAllowedException()`.
   
   4. Configure a PostgreSQL database in Superset with `database.allow_dml` set 
to `False`,
   then execute `SELECT lo_unlink(12345);` in SQL Lab: `SQLScript` parses the 
script using
   the `"postgresql"` engine, `SQLStatement.is_mutating` returns `False` 
because `LO_UNLINK`
   is not in `_MUTATING_FUNCTION_NAMES`, `parsed_script.has_mutation()` is 
`False`, the
   `SupersetDMLNotAllowedException` guard at `superset/sql_lab.py:429-431` is 
not triggered,
   and the supposedly read-only connection executes a mutating `lo_unlink` 
operation.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=855ef03db5184636a882edaa9eaddf50&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=855ef03db5184636a882edaa9eaddf50&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 569:582
   **Comment:**
        *Security: The mutating-function denylist is incomplete: `lo_unlink` is 
a PostgreSQL large-object writer (it deletes large objects) but is missing from 
`_MUTATING_FUNCTION_NAMES`. A query like `SELECT lo_unlink(...)` will not be 
classified as mutating and can bypass the read-only `allow_dml=False` guard. 
Add `LO_UNLINK` to the mutating function set.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=ff50aead1334d7d8402317988ed4af0a9b15b8a015546b269dfd563daf4eb8d8&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=ff50aead1334d7d8402317988ed4af0a9b15b8a015546b269dfd563daf4eb8d8&reaction=dislike'>👎</a>



##########
superset/sql/parse.py:
##########
@@ -829,17 +917,45 @@ def check_functions_present(self, functions: set[str]) -> 
bool:
             else:
                 present.add(function.name.upper())
 
+        # MySQL `@@<name>` syntax (also Oracle/SQL-Server `@@name`) parses as
+        # `exp.SessionParameter`, which is *not* a subclass of `exp.Func`, so
+        # the walk above misses it. Include those names so denylist entries
+        # like `version` or `hostname` match `SELECT @@version`.
+        for param in self._parsed.find_all(exp.SessionParameter):
+            present.add(param.name.upper())
+
         return any(function.upper() in present for function in functions)
 
     def check_tables_present(self, tables: set[str]) -> bool:
         """
         Check if any of the given tables are present in the statement.
 
+        Denylist entries may be bare (``pg_stat_activity``) or
+        schema-qualified (``information_schema.tables``). Bare entries
+        match by table name regardless of schema; qualified entries
+        require the schema to match too. This lets us block all access
+        to ``information_schema`` without also blocking any
+        user-authored table that happens to be named ``tables``.
+
         :param tables: Set of table names to check for (case-insensitive)
-        :return: True if any of the tables are present
-        """
-        present = {table.table.lower() for table in self.tables}
-        return any(table.lower() in present for table in tables)
+        :return: True if any of the given tables is referenced
+        """
+        present_bare: set[str] = set()
+        present_qualified: set[str] = set()
+        for t in self.tables:
+            bare = t.table.lower()
+            present_bare.add(bare)
+            if t.schema:
+                present_qualified.add(f"{t.schema.lower()}.{bare}")
+        for entry in tables:
+            needle = entry.lower()
+            if "." in needle:
+                if needle in present_qualified:
+                    return True
+            else:
+                if needle in present_bare:
+                    return True
+        return False

Review Comment:
   **Suggestion:** This adds schema-qualified table matching, but the SQL Lab 
caller that builds `found_tables` still only compares bare table names. As a 
result, when a qualified denylist entry (like `information_schema.tables`) is 
matched, the exception often reports the entire denylist instead of the actual 
referenced table. Update the caller contract to return/report matched qualified 
entries, not only bare names. [api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ SQL Lab error lists all disallowed tables, not matched ones.
   - ⚠️ Users cannot see which system view triggered rejection.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Review the PostgreSQL table denylist in `superset/config.py:127-176`, 
where
   `DISALLOWED_SQL_TABLES["postgresql"]` now includes schema-qualified entries 
such as
   `"information_schema.tables"`, `"information_schema.columns"`, and others in 
addition to
   bare names like `"pg_stat_activity"`.
   
   2. Examine the updated `SQLStatement.check_tables_present` implementation in
   `superset/sql/parse.py:390-419`, which builds `present_bare` (table names) 
and
   `present_qualified` (e.g. `"information_schema.tables"`) and returns `True` 
whenever a
   denylist entry, including schema-qualified ones, is present in the parsed 
script.
   
   3. Look at the SQL Lab enforcement in `superset/sql_lab.py:415-428`: after
   `parsed_script.check_tables_present(disallowed_tables)` returns `True`, it 
recomputes
   `found_tables` using only bare names via `present = {table.table.lower() for 
table in
   statement.tables}` and checks `if table.lower() in present:` for each 
denylist entry; for
   a denylist entry like `"information_schema.tables"`, `table.table.lower()` 
is just
   `"tables"`, so `"information_schema.tables"` is never added to 
`found_tables`.
   
   4. Run a query such as `SELECT * FROM information_schema.tables;` in SQL Lab 
against a
   PostgreSQL database: `check_tables_present` returns `True` thanks to the 
schema-qualified
   matching in `present_qualified`, but `found_tables` remains empty due to the 
bare-name
   comparison, so `SupersetDisallowedSQLTableException` at 
`superset/sql_lab.py:427` is
   raised with `found_tables or disallowed_tables`, causing the error to report 
the entire
   `DISALLOWED_SQL_TABLES["postgresql"]` set instead of accurately listing only
   `information_schema.tables`.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=38879d183bea49dda87d4bffad7c574e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=38879d183bea49dda87d4bffad7c574e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/parse.py
   **Line:** 945:958
   **Comment:**
        *Api Mismatch: This adds schema-qualified table matching, but the SQL 
Lab caller that builds `found_tables` still only compares bare table names. As 
a result, when a qualified denylist entry (like `information_schema.tables`) is 
matched, the exception often reports the entire denylist instead of the actual 
referenced table. Update the caller contract to return/report matched qualified 
entries, not only bare names.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=09e83ffada428dbc51b22d247e7235e21072473d3c38f14abcf1102214166948&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40421&comment_hash=09e83ffada428dbc51b22d247e7235e21072473d3c38f14abcf1102214166948&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to