codeant-ai-for-open-source[bot] commented on code in PR #38990:
URL: https://github.com/apache/superset/pull/38990#discussion_r3018999302


##########
superset/mcp_service/dashboard/schemas.py:
##########
@@ -448,6 +449,14 @@ class GenerateDashboardRequest(BaseModel):
         default=True, description="Whether to publish the dashboard"
     )
 
+    @field_validator("dashboard_title")
+    @classmethod
+    def sanitize_dashboard_title(cls, v: str | None) -> str | None:
+        """Sanitize dashboard title to prevent XSS attacks."""
+        return sanitize_user_input(
+            v, "Dashboard title", max_length=255, allow_empty=True
+        )

Review Comment:
   **Suggestion:** Using `sanitize_user_input()` for dashboard titles 
introduces a regression because it rejects benign text containing `data:` (and 
similar scheme-like substrings), so valid titles such as "Sales data: Q1" now 
fail validation. For dashboard titles, keep HTML/XSS stripping but avoid 
URL-scheme blocking logic intended for SQL-sensitive inputs. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ MCP `generate_dashboard` tool rejects some harmless dashboard titles.
   - ⚠️ Users must avoid common patterns like "Sales data: Q1".
   - ⚠️ Automated agents may see unexpected validation failures on title.
   ```
   </details>
   
   ```suggestion
           if v is None:
               return None
   
           value = v.strip()
           if not value:
               return None
   
           if len(value) > 255:
               raise ValueError(
                   f"Dashboard title too long ({len(value)} characters). "
                   "Maximum allowed length is 255 characters."
               )
   
           import html
           import nh3
   
           max_iterations = 100
           decoded = value
           prev = None
           iterations = 0
           while prev != decoded and iterations < max_iterations:
               prev = decoded
               decoded = html.unescape(decoded)
               iterations += 1
   
           return nh3.clean(decoded, tags=set(), 
url_schemes=set()).replace("&amp;", "&")
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Call the MCP tool function `generate_dashboard` defined in
   `superset/mcp_service/dashboard/tool/generate_dashboard.py:190-441` via the 
MCP service
   (this is the entry point decorated with `@tool`, taking a 
`GenerateDashboardRequest`
   request object).
   
   2. Provide a JSON payload that deserializes into `GenerateDashboardRequest` 
(schema in
   `superset/mcp_service/dashboard/schemas.py:35-59`) with at least one valid 
chart ID in
   `chart_ids` and a title such as `"Sales data: Q1"` in the `dashboard_title` 
field.
   
   3. During request parsing, Pydantic constructs a `GenerateDashboardRequest` 
instance and
   runs the `dashboard_title` field validator `sanitize_dashboard_title` 
implemented at
   `superset/mcp_service/dashboard/schemas.py:53-59`, which delegates to
   `sanitize_user_input(v, "Dashboard title", max_length=255, 
allow_empty=True)`.
   
   4. Inside `sanitize_user_input` in 
`superset/mcp_service/utils/sanitization.py:145-209`,
   after trimming and length checks, `_check_dangerous_patterns` is called at
   `sanitization.py:199-200`, which uses the regex 
`r"\b(javascript|vbscript|data):"`
   (`sanitization.py:93-95`). The substring `"data:"` in `"Sales data: Q1"` 
matches this
   pattern, causing `_check_dangerous_patterns` to raise `ValueError("Dashboard 
title
   contains potentially malicious URL scheme")`. This propagates to 
`generate_dashboard`,
   which catches `ValueError` in the `except` block at 
`generate_dashboard.py:443-457` and
   returns `GenerateDashboardResponse(dashboard=None, dashboard_url=None, 
error="Failed to
   create dashboard: Dashboard title contains potentially malicious URL 
scheme")` instead of
   creating the dashboard, demonstrating that benign titles containing 
plain-text `data:` are
   rejected.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/dashboard/schemas.py
   **Line:** 456:458
   **Comment:**
        *Logic Error: Using `sanitize_user_input()` for dashboard titles 
introduces a regression because it rejects benign text containing `data:` (and 
similar scheme-like substrings), so valid titles such as "Sales data: Q1" now 
fail validation. For dashboard titles, keep HTML/XSS stripping but avoid 
URL-scheme blocking logic intended for SQL-sensitive inputs.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=9f650392dbafcd8e3807f5749f58e75281802e7009b9b81af0f4c882d71b6620&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=9f650392dbafcd8e3807f5749f58e75281802e7009b9b81af0f4c882d71b6620&reaction=dislike'>👎</a>



##########
superset/mcp_service/chart/tool/get_chart_preview.py:
##########
@@ -133,12 +133,17 @@ def generate(self) -> ASCIIPreview | ChartError:
             groupby_columns = form_data.get("groupby", [])
             metrics = form_data.get("metrics", [])
 
-            columns = groupby_columns.copy()
-            if x_axis_config and isinstance(x_axis_config, str):
-                columns.append(x_axis_config)
-            elif x_axis_config and isinstance(x_axis_config, dict):
-                if "column_name" in x_axis_config:
-                    columns.append(x_axis_config["column_name"])
+            # Table charts in raw mode use all_columns
+            all_columns = form_data.get("all_columns", [])
+            if all_columns and form_data.get("query_mode") == "raw":
+                columns = list(all_columns)

Review Comment:
   **Suggestion:** Raw-mode query building only checks `all_columns`, but many 
raw chart payloads use `columns` (and may not populate `all_columns`). In those 
cases this branch falls back to `groupby`/`x_axis`, producing an empty query 
and triggering runtime "Empty query?" errors. Include `columns` as a raw-mode 
fallback when `all_columns` is missing. [logic error]
   
   <details>
   <summary><b>Severity Level:</b> Major ⚠️</summary>
   
   ```mdx
   - ❌ ASCII preview fails for raw charts using columns only.
   - ⚠️ MCP clients lose ASCII/tabular preview for such charts.
   ```
   </details>
   
   ```suggestion
               # Raw-mode charts may provide either all_columns or columns
               raw_columns = form_data.get("all_columns") or 
form_data.get("columns") or []
               if form_data.get("query_mode") == "raw" and raw_columns:
                   columns = list(raw_columns)
   ```
   <details>
   <summary><b>Steps of Reproduction ✅ </b></summary>
   
   ```mdx
   1. Configure a chart in raw query mode whose form_data only sets `"columns"` 
and not
   `"all_columns"`, for example a chart type that follows the documented 
raw-mode semantics
   of `"query_mode": "raw"` with `"columns"` (the MCP preview utilities 
explicitly support
   this pattern via `_build_query_columns` in
   `superset/mcp_service/chart/preview_utils.py:40-59`, which uses `raw_columns 
=
   form_data.get("columns") or []` and falls back to it when `all_columns` is 
empty).
   
   2. Ensure this chart is stored in the charts table so it can be loaded by
   `ChartDAO.find_by_id` and that its `params` JSON therefore contains a 
payload like
   `{"viz_type": "...", "query_mode": "raw", "columns": ["col1", "col2"], ...}` 
without an
   `"all_columns"` key; this is the exact form_data that 
`ASCIIPreviewStrategy.generate` in
   `superset/mcp_service/chart/tool/get_chart_preview.py` deserializes from
   `self.chart.params`.
   
   3. From an MCP client, call the `get_chart_preview` tool (entrypoint
   `superset/mcp_service/chart/tool/get_chart_preview.py:get_chart_preview`) 
with
   `request.identifier` set to that chart's ID and `request.format` set to 
`"ascii"`, causing
   `_get_chart_preview_internal` to construct an `ASCIIPreviewStrategy` and 
invoke its
   `generate()` method.
   
   4. Inside `ASCIIPreviewStrategy.generate()` the code at
   `superset/mcp_service/chart/tool/get_chart_preview.py:136-146` sets 
`all_columns =
   form_data.get("all_columns", [])` which is `[]` for this payload, sees 
`query_mode ==
   "raw"` but `all_columns` empty, and so falls into the `else` branch where 
`columns` is
   built only from `groupby` and `x_axis` (both absent for raw-mode charts using
   `"columns"`). This leaves `columns = []`, and the subsequent
   `QueryContextFactory().create(..., queries=[{"columns": columns, "metrics": 
metrics,
   ...}])` plus `ChartDataCommand.validate()` (same file, lines 29-50 in the 
snippet) raises
   a Superset "Empty query?" error that is caught and returned as
   `ChartError(error_type="ASCIIError")`, so the MCP caller receives `"Failed 
to generate
   ASCII preview: Empty query?"` instead of a usable ASCII preview. With the 
suggested
   change, the same call would instead set `raw_columns = 
form_data.get("all_columns") or
   form_data.get("columns") or []` and, in raw mode, populate `columns` from 
`"columns"` so
   the query validates and returns data.
   ```
   </details>
   <details>
   <summary><b>Prompt for AI Agent 🤖 </b></summary>
   
   ```mdx
   This is a comment left during a code review.
   
   **Path:** superset/mcp_service/chart/tool/get_chart_preview.py
   **Line:** 136:139
   **Comment:**
        *Logic Error: Raw-mode query building only checks `all_columns`, but 
many raw chart payloads use `columns` (and may not populate `all_columns`). In 
those cases this branch falls back to `groupby`/`x_axis`, producing an empty 
query and triggering runtime "Empty query?" errors. Include `columns` as a 
raw-mode fallback when `all_columns` is missing.
   
   Validate the correctness of the flagged issue. If correct, How can I resolve 
this? If you propose a fix, implement it and please make it concise.
   ```
   </details>
   <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=c1deb89e06850263710ffbab0ff27961accf164ae937f7227da91e0a12751dae&reaction=like'>👍</a>
 | <a 
href='https://app.codeant.ai/feedback?pr_url=https%3A%2F%2Fgithub.com%2Fapache%2Fsuperset%2Fpull%2F38990&comment_hash=c1deb89e06850263710ffbab0ff27961accf164ae937f7227da91e0a12751dae&reaction=dislike'>👎</a>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to