codeant-ai-for-open-source[bot] commented on code in PR #40568:
URL: https://github.com/apache/superset/pull/40568#discussion_r3345977072


##########
superset/models/helpers.py:
##########
@@ -1192,6 +1194,51 @@ def _process_sql_expression(  # pylint: 
disable=too-many-arguments
                 expression = sanitize_clause(expression, engine)
             except QueryClauseValidationException as ex:
                 raise QueryObjectValidationError(ex.message) from ex
+            # Adhoc expressions are user-controlled SQL that ends up inside a
+            # `literal_column(...)`. Apply the operator-configured
+            # `DISALLOWED_SQL_FUNCTIONS` / `DISALLOWED_SQL_TABLES` gates at the
+            # validation step so a dangerous function call (e.g. `version()`,
+            # `pg_read_file(...)`, `query_to_xml(...)`) is rejected before the
+            # expression is incorporated into the final SQL. This complements
+            # the same gate applied at query-execution time and gives the
+            # adhoc-expression path defense in depth.
+            disallowed_functions = app.config["DISALLOWED_SQL_FUNCTIONS"].get(
+                engine, set()
+            )
+            disallowed_tables = 
app.config["DISALLOWED_SQL_TABLES"].get(engine, set())
+            if disallowed_functions or disallowed_tables:
+                # `_process_select_expression` (and siblings) pre-wraps the
+                # input with `SELECT ...`; other callers pass bare
+                # expressions. Detect and don't double-wrap, otherwise
+                # `SELECT SELECT ...` fails the sqlglot parse.
+                sql_to_check = (
+                    expression
+                    if expression.strip().upper().startswith("SELECT")
+                    else f"SELECT {expression}"
+                )
+                parsed = SQLScript(sql_to_check, engine=engine)
+                if disallowed_functions and parsed.check_functions_present(
+                    disallowed_functions
+                ):
+                    raise 
SupersetDisallowedSQLFunctionException(disallowed_functions)
+                if disallowed_tables and 
parsed.check_tables_present(disallowed_tables):
+                    # Report only the tables actually found in the expression,
+                    # mirroring the canonical execution-time gate in
+                    # `superset.sql_lab._validate_query` so the user-facing
+                    # error doesn't echo the operator's full denylist.
+                    present_tables = {
+                        table.table.lower()
+                        for statement in parsed.statements
+                        for table in statement.tables
+                    }
+                    found_tables = {
+                        table
+                        for table in disallowed_tables
+                        if table.lower() in present_tables
+                    }
+                    raise SupersetDisallowedSQLTableException(
+                        found_tables or disallowed_tables
+                    )

Review Comment:
   **Suggestion:** Raising 
`SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException` 
directly from adhoc-expression validation changes this path from a query-object 
validation failure into a propagated `SupersetErrorException` with default HTTP 
status 500. In chart-data flows, validation errors are expected to be converted 
to `QueryObjectValidationError` (400) and handled as user input issues; this 
now surfaces as server errors instead of proper client validation responses. 
Wrap these denylist failures into `QueryObjectValidationError` (or use 
exception classes with a 4xx status) to preserve the existing API contract. 
[api mismatch]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ POST /api/v1/chart/data returns 500 for denylisted SQL.
   - ⚠️ Dashboards with such adhoc metrics show server error pages.
   - ⚠️ Breaks contract expecting 400 for invalid chart queries.
   ```
   </details>
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure engine-specific denylists in `app.config` (e.g. in 
SUPERSET_CONFIG) so
   `DISALLOWED_SQL_FUNCTIONS['postgresql'] = {'version'}` or
   `DISALLOWED_SQL_TABLES['postgresql']` is non-empty, which 
`_process_sql_expression` reads
   at `superset/models/helpers.py:56-59`.
   
   2. Issue a chart-data request to `POST /api/v1/chart/data` handled by
   `ChartDataRestApi.data` in `superset/charts/data/api.py:34-123`, with a 
query context
   whose `form_data` contains an adhoc metric or column of 
`expressionType='SQL'` and
   `"sqlExpression": "version()"` (or a query referencing a denylisted table) 
on a PostgreSQL
   dataset.
   
   3. The request is deserialized into a `QueryContext` by 
`_create_query_context_from_form`
   (`superset/charts/data/api.py:134-147`), wrapped in `ChartDataCommand`
   (`superset/commands/chart/data/get_data_command.py:34-39`), and executed via
   `command.run()` inside `_get_data_response` 
(`superset/charts/data/api.py:65-83`), which
   calls `QueryContext.get_payload()` -> `QueryContextProcessor.get_payload()` 
and ultimately
   `QueryContextProcessor.get_query_result()`
   (`superset/common/query_context_processor.py:248-256`) to fetch data from 
the datasource.
   
   4. As the datasource builds SQL for the adhoc metric/column it calls
   `_process_select_expression` in `superset/models/helpers.py:1244-1276`, 
which delegates to
   `_process_sql_expression` at `superset/models/helpers.py:26-93`; there, when
   `parsed.check_functions_present(disallowed_functions)` or
   `parsed.check_tables_present(disallowed_tables)` is true, it raises
   `SupersetDisallowedSQLFunctionException` or 
`SupersetDisallowedSQLTableException` directly
   at lines 1223 and 1239–1241. These subclasses of `SupersetErrorException` do 
not override
   `status` and therefore default to HTTP 500 (see `SupersetException.status = 
500` and the
   definitions of these classes in `superset/exceptions.py:29-37` and
   `superset/exceptions.py:9-36`). Since `ChartDataRestApi.data` only catches
   `QueryObjectValidationError` and `ValidationError` 
(`superset/charts/data/api.py:85-98`),
   the denylist exceptions escape as uncaught server errors instead of being 
converted into a
   400-level `QueryObjectValidationError`, changing the API contract for 
invalid user SQL in
   chart-data flows.
   ```
   </details>
   
   [Fix in 
Cursor](https://app.codeant.ai/fix-in-ide?tool=cursor&prompt_id=56232a37fcf84e8d8e07de0bf2387cf3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
 | [Fix in VSCode 
Claude](https://app.codeant.ai/fix-in-ide?tool=vscode-claude&prompt_id=56232a37fcf84e8d8e07de0bf2387cf3&service=github&base_url=https%3A%2F%2Fgithub.com&org=apache&repo=apache%2Fsuperset)
   
   *(Use Cmd/Ctrl + Click for best experience)*
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/models/helpers.py
   **Line:** 1220:1241
   **Comment:**
        *Api Mismatch: Raising 
`SupersetDisallowedSQLFunctionException`/`SupersetDisallowedSQLTableException` 
directly from adhoc-expression validation changes this path from a query-object 
validation failure into a propagated `SupersetErrorException` with default HTTP 
status 500. In chart-data flows, validation errors are expected to be converted 
to `QueryObjectValidationError` (400) and handled as user input issues; this 
now surfaces as server errors instead of proper client validation responses. 
Wrap these denylist failures into `QueryObjectValidationError` (or use 
exception classes with a 4xx status) to preserve the existing API contract.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   Once fix is implemented, also check other comments on the same PR, and ask 
user if the user wants to fix the rest of the comments as well. if said yes, 
then fetch all the comments validate the correctness and implement a minimal fix
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40568&comment_hash=8ef0b2f650cae06e077f812ae4a9bf5f088ab915ad7bcac477701d9014174918&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F40568&comment_hash=8ef0b2f650cae06e077f812ae4a9bf5f088ab915ad7bcac477701d9014174918&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to