codeant-ai-for-open-source[bot] commented on code in PR #40963:
URL: https://github.com/apache/superset/pull/40963#discussion_r3393871442
##########
tests/unit_tests/sql/execution/test_executor.py:
##########
@@ -350,6 +350,60 @@ def test_execute_allowed_functions(
assert result.status == QueryStatus.SUCCESS
+def test_execute_disallowed_function_not_matched_in_identifiers(
+ mocker: MockerFixture, database: Database, app_context: None
+) -> None:
+ """Identifiers containing disallowed function names should not be
blocked."""
+ mock_query_execution(
+ mocker,
+ database,
+ return_data=[(1, "Alice", "Alice Full", 5)],
+ column_names=["id", "name", "fullname", "skilllevel"],
+ )
+ mocker.patch.object(database.db_engine_spec, "engine", "mysql")
+ mocker.patch.dict(
+ current_app.config,
+ {
+ "SQL_QUERY_MUTATOR": None,
+ "SQLLAB_TIMEOUT": 30,
+ "SQL_MAX_ROW": None,
+ "DISALLOWED_SQL_FUNCTIONS": {"mysql": {"kill"}},
+ "QUERY_LOGGER": None,
+ },
+ )
+
+ sql = (
+ "SELECT r.id, r.name, r.fullname, s.skilllevel "
+ "FROM ppb_db.PPB_OMD_SKILLS s "
+ "JOIN ppb_db.PPB_OMD_RESOURCES r ON s.resource_id = r.id "
+ "WHERE s.SkillName = 'ELIS'"
+ )
+ result = database.execute(sql)
+
+ assert result.status == QueryStatus.SUCCESS
+
+
+def test_execute_disallowed_mysql_kill_function(
+ mocker: MockerFixture, database: Database, app_context: None
+) -> None:
+ """Actual disallowed function calls should still be blocked."""
+ mocker.patch.object(database.db_engine_spec, "engine", "mysql")
+ mocker.patch.dict(
+ current_app.config,
+ {
+ "SQL_QUERY_MUTATOR": None,
+ "SQLLAB_TIMEOUT": 30,
+ "DISALLOWED_SQL_FUNCTIONS": {"mysql": {"kill"}},
+ },
+ )
+
+ result = database.execute("SELECT KILL(123)")
+
+ assert result.status == QueryStatus.FAILED
+ assert result.error_message is not None
+ assert "kill" in result.error_message.lower()
Review Comment:
**Suggestion:** This test can pass even when the disallowed-function
security check is broken, because `SELECT KILL(123)` can fail at database
execution time and still produce a FAILED status with an error containing
`kill`. That makes the assertion non-specific and can hide regressions in the
guard. Assert the specific disallowed-function error text (or assert execution
was never attempted) so the test validates the intended security path. [logic
error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Security regression may bypass tests for disallowed functions.
- ⚠️ Disallowed-function guard could break without unit-test detection.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Locate `test_execute_disallowed_mysql_kill_function` in
`tests/unit_tests/sql/execution/test_executor.py:67-85`. It patches
`database.db_engine_spec.engine` to `"mysql"` and configures
`current_app.config["DISALLOWED_SQL_FUNCTIONS"] = {"mysql": {"kill"}}`, then
calls `result
= database.execute("SELECT KILL(123)")`.
2. The `database` fixture in
`tests/unit_tests/sql/execution/conftest.py:114-121`
constructs a real `Database` pointing at `sqlite://`, so `Database.execute()`
(superset/models/core.py:1349-1363) delegates to
`SQLExecutor(self).execute(sql, options)`
in `superset/sql/execution/executor.py:204-237`.
3. In `SQLExecutor.execute()`, `_check_security(transformed_script)`
(executor.py:236-253)
calls `_check_disallowed_functions(script)` (executor.py:40-64); if this
detects `kill`,
it raises `SupersetSecurityException` with a `SupersetError` whose message
is `"Disallowed
SQL functions: kill"`, and the `except SupersetSecurityException` block at
executor.py:99-106 converts that into a FAILED `QueryResult` whose
`error_message` is the
string containing `"kill"`.
4. If `_check_disallowed_functions` is accidentally broken to always return
`None` (e.g.,
temporarily `return None` at executor.py:55), then `_check_security` will
not raise, and
execution continues to `_execute_statements()` which sends `"SELECT
KILL(123)"` to the
underlying sqlite database. SQLite will raise a DBAPI error such as `"no
such function:
KILL"`, which is caught by the generic `except Exception as ex` branch at
executor.py:111-113, and `database.db_engine_spec.extract_error_message(ex)`
will return a
message containing `"KILL"`/`"kill"`.
5. In both scenarios—(a) security exception `Disallowed SQL functions: kill`
or (b)
DB-level `"no such function: KILL"` error—the test's assertions at
`tests/unit_tests/sql/execution/test_executor.py:81-85` (`status == FAILED`,
`error_message is not None`, and `"kill" in result.error_message.lower()`)
all pass, so
the test does not distinguish whether the failure came from the intended
disallowed-function guard or from a generic SQL execution error.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=1e58e3ab97ad4a8b8dfcc8a2064f9e9d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=1e58e3ab97ad4a8b8dfcc8a2064f9e9d&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** tests/unit_tests/sql/execution/test_executor.py
**Line:** 400:404
**Comment:**
*Logic Error: This test can pass even when the disallowed-function
security check is broken, because `SELECT KILL(123)` can fail at database
execution time and still produce a FAILED status with an error containing
`kill`. That makes the assertion non-specific and can hide regressions in the
guard. Assert the specific disallowed-function error text (or assert execution
was never attempted) so the test validates the intended security path.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=c6d1c445a18ac1a679bc433ec5e38cfa4ac63b493008cdf1b7cfda8eacc8f620&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=c6d1c445a18ac1a679bc433ec5e38cfa4ac63b493008cdf1b7cfda8eacc8f620&reaction=dislike'>👎</a>
##########
superset/sql/execution/executor.py:
##########
@@ -691,16 +691,16 @@ def _check_disallowed_functions(self, script: SQLScript)
-> set[str] | None:
if not engine_disallowed:
return None
- # Check each statement for disallowed functions
- found = set()
- for statement in script.statements:
- # Use the statement's AST to check for function calls
- statement_str = str(statement).upper()
- for func in engine_disallowed:
- if func.upper() in statement_str:
- found.add(func)
+ if not script.check_functions_present(engine_disallowed):
+ return None
- return found if found else None
+ # Return only disallowed functions actually invoked in the query.
+ found = {
+ func
+ for func in engine_disallowed
+ if script.check_functions_present({func})
+ }
+ return found or None
Review Comment:
**Suggestion:** The new implementation walks the SQL AST once for the full
set and then again once per disallowed function, which multiplies work on every
query. Because `check_functions_present` traverses parsed statements/functions
each call, this becomes O(number_of_disallowed_functions × AST_size) and can
noticeably slow execution when blocklists grow. Compute present function names
once and intersect with the disallowed set instead of repeatedly calling
`check_functions_present`. [performance]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ⚠️ Disallowed-function checks rescan AST multiple times per execution.
- ⚠️ Queries on blocklisted databases incur unnecessary CPU overhead.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure disallowed SQL functions for an engine, e.g. set
`DISALLOWED_SQL_FUNCTIONS =
{"sqlite": {"LOAD_EXTENSION", "VERSION", "KILL"}}` in `superset_config.py`
as read in
`superset/sql/execution/executor.py:47-53` inside
`_check_disallowed_functions`.
2. Execute any query through the public API `Database.execute()` (e.g.
`tests/unit_tests/sql/execution/test_executor.py:test_execute_disallowed_sqlite_load_extension_function`
at lines 13-31), which calls `superset.models.core.Database.execute()`
(lines 1349-1363)
and then `SQLExecutor.execute()` in
`superset/sql/execution/executor.py:204-237`.
3. In `SQLExecutor.execute()`, `_prepare_sql()` builds a `SQLScript` from
the SQL, and
`_check_security()` (executor.py:236-253) is invoked with the parsed script,
which in turn
calls `_check_disallowed_functions(script)` (executor.py:40-64).
4. Inside `_check_disallowed_functions`,
`script.check_functions_present(engine_disallowed)` (executor.py:55) calls
`SQLScript.check_functions_present()` in `superset/sql/parse.py:95-105`,
which loops over
all statements and calls `SQLStatement.check_functions_present(functions)`
(`parse.py:831-867`) to walk the sqlglot AST and build the `present`
function-name set by
iterating over every `exp.Func`.
5. The comprehension at executor.py:59-63 then iterates over every function
in
`engine_disallowed` and calls `script.check_functions_present({func})` for
each, causing
`SQLStatement.check_functions_present` to traverse the AST and rebuild the
`present` set
once per disallowed function in addition to the initial call.
6. With N disallowed functions, every query executed on a database with a
configured
disallowed-function blocklist incurs N+1 full AST scans of all function
nodes, producing
O(N × AST_size) behavior in the security check that could be reduced to a
single AST
traversal followed by a set intersection.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=a2f41fe0c248416998426f70a0c8f59e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=a2f41fe0c248416998426f70a0c8f59e&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/sql/execution/executor.py
**Line:** 694:703
**Comment:**
*Performance: The new implementation walks the SQL AST once for the
full set and then again once per disallowed function, which multiplies work on
every query. Because `check_functions_present` traverses parsed
statements/functions each call, this becomes O(number_of_disallowed_functions ×
AST_size) and can noticeably slow execution when blocklists grow. Compute
present function names once and intersect with the disallowed set instead of
repeatedly calling `check_functions_present`.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=06601400abc2008249f652e47718cd6d83a0b5057f291530bb20f25f51fd6004&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40963&comment_hash=06601400abc2008249f652e47718cd6d83a0b5057f291530bb20f25f51fd6004&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]