codeant-ai-for-open-source[bot] commented on code in PR #40568:
URL: https://github.com/apache/superset/pull/40568#discussion_r3345977072
##########
superset/models/helpers.py:
##########
@@ -1192,6 +1194,51 @@ def _process_sql_expression( # pylint:
disable=too-many-arguments
expression = sanitize_clause(expression, engine)
except QueryClauseValidationException as ex:
raise QueryObjectValidationError(ex.message) from ex
+ # Adhoc expressions are user-controlled SQL that ends up inside a
+ # `literal_column(...)`. Apply the operator-configured
+ # `DISALLOWED_SQL_FUNCTIONS` / `DISALLOWED_SQL_TABLES` gates at the
+ # validation step so a dangerous function call (e.g. `version()`,
+ # `pg_read_file(...)`, `query_to_xml(...)`) is rejected before the
+ # expression is incorporated into the final SQL. This complements
+ # the same gate applied at query-execution time and gives the
+ # adhoc-expression path defense in depth.
+ disallowed_functions = app.config["DISALLOWED_SQL_FUNCTIONS"].get(
+ engine, set()
+ )
+ disallowed_tables =
app.config["DISALLOWED_SQL_TABLES"].get(engine, set())
+ if disallowed_functions or disallowed_tables:
+ # `_process_select_expression` (and siblings) pre-wraps the
+ # input with `SELECT ...`; other callers pass bare
+ # expressions. Detect and don't double-wrap, otherwise
+ # `SELECT SELECT ...` fails the sqlglot parse.
+ sql_to_check = (
+ expression
+ if expression.strip().upper().startswith("SELECT")
+ else f"SELECT {expression}"
+ )
+ parsed = SQLScript(sql_to_check, engine=engine)
+ if disallowed_functions and parsed.check_functions_present(
+ disallowed_functions
+ ):
+ raise
SupersetDisallowedSQLFunctionException(disallowed_functions)
+ if disallowed_tables and
parsed.check_tables_present(disallowed_tables):
+ # Report only the tables actually found in the expression,
+ # mirroring the canonical execution-time gate in
+ # `superset.sql_lab._validate_query` so the user-facing
+ # error doesn't echo the operator's full denylist.
+ present_tables = {
+ table.table.lower()
+ for statement in parsed.statements
+ for table in statement.tables
+ }
+ found_tables = {
+ table
+ for table in disallowed_tables
+ if table.lower() in present_tables
+ }
+ raise SupersetDisallowedSQLTableException(
+ found_tables or disallowed_tables
+ )
Review Comment:
**Suggestion:** Raising
`SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException`
directly from adhoc-expression validation changes this path from a query-object
validation failure into a propagated `SupersetErrorException` with default HTTP
status 500. In chart-data flows, validation errors are expected to be converted
to `QueryObjectValidationError` (400) and handled as user input issues; this
now surfaces as server errors instead of proper client validation responses.
Wrap these denylist failures into `QueryObjectValidationError` (or use
exception classes with a 4xx status) to preserve the existing API contract.
[api mismatch]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ POST /api/v1/chart/data returns 500 for denylisted SQL.
- ⚠️ Dashboards with such adhoc metrics show server error pages.
- ⚠️ Breaks contract expecting 400 for invalid chart queries.
```
</details>
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure engine-specific denylists in `app.config` (e.g. in
SUPERSET_CONFIG) so
`DISALLOWED_SQL_FUNCTIONS['postgresql'] = {'version'}` or
`DISALLOWED_SQL_TABLES['postgresql']` is non-empty, which
`_process_sql_expression` reads
at `superset/models/helpers.py:56-59`.
2. Issue a chart-data request to `POST /api/v1/chart/data` handled by
`ChartDataRestApi.data` in `superset/charts/data/api.py:34-123`, with a
query context
whose `form_data` contains an adhoc metric or column of
`expressionType='SQL'` and
`"sqlExpression": "version()"` (or a query referencing a denylisted table)
on a PostgreSQL
dataset.
3. The request is deserialized into a `QueryContext` by
`_create_query_context_from_form`
(`superset/charts/data/api.py:134-147`), wrapped in `ChartDataCommand`
(`superset/commands/chart/data/get_data_command.py:34-39`), and executed via
`command.run()` inside `_get_data_response`
(`superset/charts/data/api.py:65-83`), which
calls `QueryContext.get_payload()` -> `QueryContextProcessor.get_payload()`
and ultimately
`QueryContextProcessor.get_query_result()`
(`superset/common/query_context_processor.py:248-256`) to fetch data from
the datasource.
4. As the datasource builds SQL for the adhoc metric/column it calls
`_process_select_expression` in `superset/models/helpers.py:1244-1276`,
which delegates to
`_process_sql_expression` at `superset/models/helpers.py:26-93`; there, when
`parsed.check_functions_present(disallowed_functions)` or
`parsed.check_tables_present(disallowed_tables)` is true, it raises
`SupersetDisallowedSQLFunctionException` or
`SupersetDisallowedSQLTableException` directly
at lines 1223 and 1239–1241. These subclasses of `SupersetErrorException` do
not override
`status` and therefore default to HTTP 500 (see `SupersetException.status =
500` and the
definitions of these classes in `superset/exceptions.py:29-37` and
`superset/exceptions.py:9-36`). Since `ChartDataRestApi.data` only catches
`QueryObjectValidationError` and `ValidationError`
(`superset/charts/data/api.py:85-98`),
the denylist exceptions escape as uncaught server errors instead of being
converted into a
400-level `QueryObjectValidationError`, changing the API contract for
invalid user SQL in
chart-data flows.
```
</details>
[Fix in
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=56232a37fcf84e8d8e07de0bf2387cf3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
| [Fix in VSCode
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=56232a37fcf84e8d8e07de0bf2387cf3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
*(Use Cmd/Ctrl + Click for best experience)*
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/models/helpers.py
**Line:** 1220:1241
**Comment:**
*Api Mismatch: Raising
`SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException`
directly from adhoc-expression validation changes this path from a query-object
validation failure into a propagated `SupersetErrorException` with default HTTP
status 500. In chart-data flows, validation errors are expected to be converted
to `QueryObjectValidationError` (400) and handled as user input issues; this
now surfaces as server errors instead of proper client validation responses.
Wrap these denylist failures into `QueryObjectValidationError` (or use
exception classes with a 4xx status) to preserve the existing API contract.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask
user if the user wants to fix the rest of the comments as well. if said yes,
then fetch all the comments validate the correctness and implement a minimal fix
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40568&comment_hash=8ef0b2f650cae06e077f812ae4a9bf5f088ab915ad7bcac477701d9014174918&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40568&comment_hash=8ef0b2f650cae06e077f812ae4a9bf5f088ab915ad7bcac477701d9014174918&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]