codeant-ai-for-open-source[bot] commented on code in PR #40963:
URL: https://github.com/apache/superset/pull/40963#discussion_r3393871442


##########
tests/unit_tests/sql/execution/test_executor.py:
##########
@@ -350,6 +350,60 @@ def test_execute_allowed_functions(
     assert result.status == QueryStatus.SUCCESS
 
 
+def test_execute_disallowed_function_not_matched_in_identifiers(
+    mocker: MockerFixture, database: Database, app_context: None
+) -> None:
+    """Identifiers containing disallowed function names should not be 
blocked."""
+    mock_query_execution(
+        mocker,
+        database,
+        return_data=[(1, "Alice", "Alice Full", 5)],
+        column_names=["id", "name", "fullname", "skilllevel"],
+    )
+    mocker.patch.object(database.db_engine_spec, "engine", "mysql")
+    mocker.patch.dict(
+        current_app.config,
+        {
+            "SQL_QUERY_MUTATOR": None,
+            "SQLLAB_TIMEOUT": 30,
+            "SQL_MAX_ROW": None,
+            "DISALLOWED_SQL_FUNCTIONS": {"mysql": {"kill"}},
+            "QUERY_LOGGER": None,
+        },
+    )
+
+    sql = (
+        "SELECT r.id, r.name, r.fullname, s.skilllevel "
+        "FROM ppb_db.PPB_OMD_SKILLS s "
+        "JOIN ppb_db.PPB_OMD_RESOURCES r ON s.resource_id = r.id "
+        "WHERE s.SkillName = 'ELIS'"
+    )
+    result = database.execute(sql)
+
+    assert result.status == QueryStatus.SUCCESS
+
+
+def test_execute_disallowed_mysql_kill_function(
+    mocker: MockerFixture, database: Database, app_context: None
+) -> None:
+    """Actual disallowed function calls should still be blocked."""
+    mocker.patch.object(database.db_engine_spec, "engine", "mysql")
+    mocker.patch.dict(
+        current_app.config,
+        {
+            "SQL_QUERY_MUTATOR": None,
+            "SQLLAB_TIMEOUT": 30,
+            "DISALLOWED_SQL_FUNCTIONS": {"mysql": {"kill"}},
+        },
+    )
+
+    result = database.execute("SELECT KILL(123)")
+
+    assert result.status == QueryStatus.FAILED
+    assert result.error_message is not None
+    assert "kill" in result.error_message.lower()

Review Comment:
   **Suggestion:** This test can pass even when the disallowed-function 
security check is broken, because `SELECT KILL(123)` can fail at database 
execution time and still produce a FAILED status with an error containing 
`kill`. That makes the assertion non-specific and can hide regressions in the 
guard. Assert the specific disallowed-function error text (or assert execution 
was never attempted) so the test validates the intended security path. [logic 
error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Security regression may bypass tests for disallowed functions.
   - ⚠️ Disallowed-function guard could break without unit-test detection.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Locate `test_execute_disallowed_mysql_kill_function` in
   `tests/unit_tests/sql/execution/test_executor.py:67-85`. It patches
   `database.db_engine_spec.engine` to `"mysql"` and configures
   `current_app.config["DISALLOWED_SQL_FUNCTIONS"] = {"mysql": {"kill"}}`, then 
calls `result
   = database.execute("SELECT KILL(123)")`.
   
   2. The `database` fixture in 
`tests/unit_tests/sql/execution/conftest.py:114-121`
   constructs a real `Database` pointing at `sqlite://`, so `Database.execute()`
   (superset/models/core.py:1349-1363) delegates to 
`SQLExecutor(self).execute(sql, options)`
   in `superset/sql/execution/executor.py:204-237`.
   
   3. In `SQLExecutor.execute()`, `_check_security(transformed_script)` 
(executor.py:236-253)
   calls `_check_disallowed_functions(script)` (executor.py:40-64); if this 
detects `kill`,
   it raises `SupersetSecurityException` with a `SupersetError` whose message 
is `"Disallowed
   SQL functions: kill"`, and the `except SupersetSecurityException` block at
   executor.py:99-106 converts that into a FAILED `QueryResult` whose 
`error_message` is the
   string containing `"kill"`.
   
   4. If `_check_disallowed_functions` is accidentally broken to always return 
`None` (e.g.,
   temporarily `return None` at executor.py:55), then `_check_security` will 
not raise, and
   execution continues to `_execute_statements()` which sends `"SELECT 
KILL(123)"` to the
   underlying sqlite database. SQLite will raise a DBAPI error such as `"no 
such function:
   KILL"`, which is caught by the generic `except Exception as ex` branch at
   executor.py:111-113, and `database.db_engine_spec.extract_error_message(ex)` 
will return a
   message containing `"KILL"`/`"kill"`.
   
   5. In both scenarios—(a) security exception `Disallowed SQL functions: kill` 
or (b)
   DB-level `"no such function: KILL"` error—the test's assertions at
   `tests/unit_tests/sql/execution/test_executor.py:81-85` (`status == FAILED`,
   `error_message is not None`, and `"kill" in result.error_message.lower()`) 
all pass, so
   the test does not distinguish whether the failure came from the intended
   disallowed-function guard or from a generic SQL execution error.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1e58e3ab97ad4a8b8dfcc8a2064f9e9d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1e58e3ab97ad4a8b8dfcc8a2064f9e9d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** tests/unit_tests/sql/execution/test_executor.py
   **Line:** 400:404
   **Comment:**
        *Logic Error: This test can pass even when the disallowed-function 
security check is broken, because `SELECT KILL(123)` can fail at database 
execution time and still produce a FAILED status with an error containing 
`kill`. That makes the assertion non-specific and can hide regressions in the 
guard. Assert the specific disallowed-function error text (or assert execution 
was never attempted) so the test validates the intended security path.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=c6d1c445a18ac1a679bc433ec5e38cfa4ac63b493008cdf1b7cfda8eacc8f620&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=c6d1c445a18ac1a679bc433ec5e38cfa4ac63b493008cdf1b7cfda8eacc8f620&reaction=dislike'>👎</a>



##########
superset/sql/execution/executor.py:
##########
@@ -691,16 +691,16 @@ def _check_disallowed_functions(self, script: SQLScript) 
-> set[str] | None:
         if not engine_disallowed:
             return None
 
-        # Check each statement for disallowed functions
-        found = set()
-        for statement in script.statements:
-            # Use the statement's AST to check for function calls
-            statement_str = str(statement).upper()
-            for func in engine_disallowed:
-                if func.upper() in statement_str:
-                    found.add(func)
+        if not script.check_functions_present(engine_disallowed):
+            return None
 
-        return found if found else None
+        # Return only disallowed functions actually invoked in the query.
+        found = {
+            func
+            for func in engine_disallowed
+            if script.check_functions_present({func})
+        }
+        return found or None

Review Comment:
   **Suggestion:** The new implementation walks the SQL AST once for the full 
set and then again once per disallowed function, which multiplies work on every 
query. Because `check_functions_present` traverses parsed statements/functions 
each call, this becomes O(number_of_disallowed_functions × AST_size) and can 
noticeably slow execution when blocklists grow. Compute present function names 
once and intersect with the disallowed set instead of repeatedly calling 
`check_functions_present`. [performance]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ⚠️ Disallowed-function checks rescan AST multiple times per execution.
   - ⚠️ Queries on blocklisted databases incur unnecessary CPU overhead.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure disallowed SQL functions for an engine, e.g. set 
`DISALLOWED_SQL_FUNCTIONS =
   {"sqlite": {"LOAD_EXTENSION", "VERSION", "KILL"}}` in `superset_config.py` 
as read in
   `superset/sql/execution/executor.py:47-53` inside 
`_check_disallowed_functions`.
   
   2. Execute any query through the public API `Database.execute()` (e.g.
   
`tests/unit_tests/sql/execution/test_executor.py:test_execute_disallowed_sqlite_load_extension_function`
   at lines 13-31), which calls `superset.models.core.Database.execute()` 
(lines 1349-1363)
   and then `SQLExecutor.execute()` in 
`superset/sql/execution/executor.py:204-237`.
   
   3. In `SQLExecutor.execute()`, `_prepare_sql()` builds a `SQLScript` from 
the SQL, and
   `_check_security()` (executor.py:236-253) is invoked with the parsed script, 
which in turn
   calls `_check_disallowed_functions(script)` (executor.py:40-64).
   
   4. Inside `_check_disallowed_functions`,
   `script.check_functions_present(engine_disallowed)` (executor.py:55) calls
   `SQLScript.check_functions_present()` in `superset/sql/parse.py:95-105`, 
which loops over
   all statements and calls `SQLStatement.check_functions_present(functions)`
   (`parse.py:831-867`) to walk the sqlglot AST and build the `present` 
function-name set by
   iterating over every `exp.Func`.
   
   5. The comprehension at executor.py:59-63 then iterates over every function 
in
   `engine_disallowed` and calls `script.check_functions_present({func})` for 
each, causing
   `SQLStatement.check_functions_present` to traverse the AST and rebuild the 
`present` set
   once per disallowed function in addition to the initial call.
   
   6. With N disallowed functions, every query executed on a database with a 
configured
   disallowed-function blocklist incurs N+1 full AST scans of all function 
nodes, producing
   O(N × AST_size) behavior in the security check that could be reduced to a 
single AST
   traversal followed by a set intersection.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=a2f41fe0c248416998426f70a0c8f59e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=a2f41fe0c248416998426f70a0c8f59e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/sql/execution/executor.py
   **Line:** 694:703
   **Comment:**
        *Performance: The new implementation walks the SQL AST once for the 
full set and then again once per disallowed function, which multiplies work on 
every query. Because `check_functions_present` traverses parsed 
statements/functions each call, this becomes O(number_of_disallowed_functions × 
AST_size) and can noticeably slow execution when blocklists grow. Compute 
present function names once and intersect with the disallowed set instead of 
repeatedly calling `check_functions_present`.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=06601400abc2008249f652e47718cd6d83a0b5057f291530bb20f25f51fd6004&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=06601400abc2008249f652e47718cd6d83a0b5057f291530bb20f25f51fd6004&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to