codeant-ai-for-open-source[bot] commented on code in PR #38990:
URL: https://github.com/apache/superset/pull/38990#discussion_r3018999302
##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -448,6 +449,14 @@ class GenerateDashboardRequest(BaseModel):
default=True, description="Whether to publish the dashboard"
)
+ @field_validator("dashboard_title")
+ @classmethod
+ def sanitize_dashboard_title(cls, v: str | None) -> str | None:
+ """Sanitize dashboard title to prevent XSS attacks."""
+ return sanitize_user_input(
+ v, "Dashboard title", max_length=255, allow_empty=True
+ )
Review Comment:
**Suggestion:** Using `sanitize_user_input()` for dashboard titles
introduces a regression because it rejects benign text containing `data:` (and
similar scheme-like substrings), so valid titles such as "Sales data: Q1" now
fail validation. For dashboard titles, keep HTML/XSS stripping but avoid
URL-scheme blocking logic intended for SQL-sensitive inputs. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ MCP `generate_dashboard` tool rejects some harmless dashboard titles.
- ⚠️ Users must avoid common patterns like "Sales data: Q1".
- ⚠️ Automated agents may see unexpected validation failures on title.
```
</details>
```suggestion
if v is None:
return None
value = v.strip()
if not value:
return None
if len(value) > 255:
raise ValueError(
f"Dashboard title too long ({len(value)} characters). "
"Maximum allowed length is 255 characters."
)
import html
import nh3
max_iterations = 100
decoded = value
prev = None
iterations = 0
while prev != decoded and iterations < max_iterations:
prev = decoded
decoded = html.unescape(decoded)
iterations += 1
return nh3.clean(decoded, tags=set(),
url_schemes=set()).replace("&", "&")
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Call the MCP tool function `generate_dashboard` defined in
`superset/mcp_service/dashboard/tool/generate_dashboard.py:190-441` via the
MCP service
(this is the entry point decorated with `@tool`, taking a
`GenerateDashboardRequest`
request object).
2. Provide a JSON payload that deserializes into `GenerateDashboardRequest`
(schema in
`superset/mcp_service/dashboard/schemas.py:35-59`) with at least one valid
chart ID in
`chart_ids` and a title such as `"Sales data: Q1"` in the `dashboard_title`
field.
3. During request parsing, Pydantic constructs a `GenerateDashboardRequest`
instance and
runs the `dashboard_title` field validator `sanitize_dashboard_title`
implemented at
`superset/mcp_service/dashboard/schemas.py:53-59`, which delegates to
`sanitize_user_input(v, "Dashboard title", max_length=255,
allow_empty=True)`.
4. Inside `sanitize_user_input` in
`superset/mcp_service/utils/sanitization.py:145-209`,
after trimming and length checks, `_check_dangerous_patterns` is called at
`sanitization.py:199-200`, which uses the regex
`r"\b(javascript|vbscript|data):"`
(`sanitization.py:93-95`). The substring `"data:"` in `"Sales data: Q1"`
matches this
pattern, causing `_check_dangerous_patterns` to raise `ValueError("Dashboard
title
contains potentially malicious URL scheme")`. This propagates to
`generate_dashboard`,
which catches `ValueError` in the `except` block at
`generate_dashboard.py:443-457` and
returns `GenerateDashboardResponse(dashboard=None, dashboard_url=None,
error="Failed to
create dashboard: Dashboard title contains potentially malicious URL
scheme")` instead of
creating the dashboard, demonstrating that benign titles containing
plain-text `data:` are
rejected.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/dashboard/schemas.py
**Line:** 456:458
**Comment:**
*Logic Error: Using `sanitize_user_input()` for dashboard titles
introduces a regression because it rejects benign text containing `data:` (and
similar scheme-like substrings), so valid titles such as "Sales data: Q1" now
fail validation. For dashboard titles, keep HTML/XSS stripping but avoid
URL-scheme blocking logic intended for SQL-sensitive inputs.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=9f650392dbafcd8e3807f5749f58e75281802e7009b9b81af0f4c882d71b6620&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=9f650392dbafcd8e3807f5749f58e75281802e7009b9b81af0f4c882d71b6620&reaction=dislike'>👎</a>
##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -133,12 +133,17 @@ def generate(self) -> ASCIIPreview | ChartError:
groupby_columns = form_data.get("groupby", [])
metrics = form_data.get("metrics", [])
- columns = groupby_columns.copy()
- if x_axis_config and isinstance(x_axis_config, str):
- columns.append(x_axis_config)
- elif x_axis_config and isinstance(x_axis_config, dict):
- if "column_name" in x_axis_config:
- columns.append(x_axis_config["column_name"])
+ # Table charts in raw mode use all_columns
+ all_columns = form_data.get("all_columns", [])
+ if all_columns and form_data.get("query_mode") == "raw":
+ columns = list(all_columns)
Review Comment:
**Suggestion:** Raw-mode query building only checks `all_columns`, but many
raw chart payloads use `columns` (and may not populate `all_columns`). In those
cases this branch falls back to `groupby`/`x_axis`, producing an empty query
and triggering runtime "Empty query?" errors. Include `columns` as a raw-mode
fallback when `all_columns` is missing. [logic error]
<details>
<summary><b>Severity Level:</b> Major ⚠️</summary>
```mdx
- ❌ ASCII preview fails for raw charts using columns only.
- ⚠️ MCP clients lose ASCII/tabular preview for such charts.
```
</details>
```suggestion
# Raw-mode charts may provide either all_columns or columns
raw_columns = form_data.get("all_columns") or
form_data.get("columns") or []
if form_data.get("query_mode") == "raw" and raw_columns:
columns = list(raw_columns)
```
<details>
<summary><b>Steps of Reproduction ✅ </b></summary>
```mdx
1. Configure a chart in raw query mode whose form_data only sets `"columns"`
and not
`"all_columns"`, for example a chart type that follows the documented
raw-mode semantics
of `"query_mode": "raw"` with `"columns"` (the MCP preview utilities
explicitly support
this pattern via `_build_query_columns` in
`superset/mcp_service/chart/preview_utils.py:40-59`, which uses `raw_columns
=
form_data.get("columns") or []` and falls back to it when `all_columns` is
empty).
2. Ensure this chart is stored in the charts table so it can be loaded by
`ChartDAO.find_by_id` and that its `params` JSON therefore contains a
payload like
`{"viz_type": "...", "query_mode": "raw", "columns": ["col1", "col2"], ...}`
without an
`"all_columns"` key; this is the exact form_data that
`ASCIIPreviewStrategy.generate` in
`superset/mcp_service/chart/tool/get_chart_preview.py` deserializes from
`self.chart.params`.
3. From an MCP client, call the `get_chart_preview` tool (entrypoint
`superset/mcp_service/chart/tool/get_chart_preview.py:get_chart_preview`)
with
`request.identifier` set to that chart's ID and `request.format` set to
`"ascii"`, causing
`_get_chart_preview_internal` to construct an `ASCIIPreviewStrategy` and
invoke its
`generate()` method.
4. Inside `ASCIIPreviewStrategy.generate()` the code at
`superset/mcp_service/chart/tool/get_chart_preview.py:136-146` sets
`all_columns =
form_data.get("all_columns", [])` which is `[]` for this payload, sees
`query_mode ==
"raw"` but `all_columns` empty, and so falls into the `else` branch where
`columns` is
built only from `groupby` and `x_axis` (both absent for raw-mode charts using
`"columns"`). This leaves `columns = []`, and the subsequent
`QueryContextFactory().create(..., queries=[{"columns": columns, "metrics":
metrics,
...}])` plus `ChartDataCommand.validate()` (same file, lines 29-50 in the
snippet) raises
a Superset "Empty query?" error that is caught and returned as
`ChartError(error_type="ASCIIError")`, so the MCP caller receives `"Failed
to generate
ASCII preview: Empty query?"` instead of a usable ASCII preview. With the
suggested
change, the same call would instead set `raw_columns =
form_data.get("all_columns") or
form_data.get("columns") or []` and, in raw mode, populate `columns` from
`"columns"` so
the query validates and returns data.
```
</details>
<details>
<summary><b>Prompt for AI Agent 🤖 </b></summary>
```mdx
This is a comment left during a code review.
**Path:** superset/mcp_service/chart/tool/get_chart_preview.py
**Line:** 136:139
**Comment:**
*Logic Error: Raw-mode query building only checks `all_columns`, but
many raw chart payloads use `columns` (and may not populate `all_columns`). In
those cases this branch falls back to `groupby`/`x_axis`, producing an empty
query and triggering runtime "Empty query?" errors. Include `columns` as a
raw-mode fallback when `all_columns` is missing.
Validate the correctness of the flagged issue. If correct, How can I resolve
this? If you propose a fix, implement it and please make it concise.
```
</details>
<a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=c1deb89e06850263710ffbab0ff27961accf164ae937f7227da91e0a12751dae&reaction=like'>👍</a>
| <a
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=c1deb89e06850263710ffbab0ff27961accf164ae937f7227da91e0a12751dae&reaction=dislike'>👎</a>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]